2023-01-11T21:14:24.9265658Z Requested labels: linux.8xlarge.nvidia.gpu 2023-01-11T21:14:24.9265770Z Job defined at: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/tags/ciflow/trunk/91627 2023-01-11T21:14:24.9265956Z Reusable workflow chain: 2023-01-11T21:14:24.9265996Z pytorch/pytorch/.github/workflows/trunk.yml@refs/tags/ciflow/trunk/91627 (8419ddda87c8a47eacc63b54bc7ec98c1f27c26e) 2023-01-11T21:14:24.9266045Z -> pytorch/pytorch/.github/workflows/_linux-test.yml@refs/tags/ciflow/trunk/91627 (8419ddda87c8a47eacc63b54bc7ec98c1f27c26e) 2023-01-11T21:14:24.9266079Z Waiting for a runner to pick up this job... 2023-01-11T21:14:25.2466501Z Job is about to start running on the runner: i-0a2cfe12f2970a977 (organization) 2023-01-11T21:14:30.4593124Z Current runner version: '2.300.2' 2023-01-11T21:14:30.4600929Z Runner name: 'i-0a2cfe12f2970a977' 2023-01-11T21:14:30.4601638Z Runner group name: 'Default' 2023-01-11T21:14:30.4602379Z Machine name: 'ip-10-0-0-121' 2023-01-11T21:14:30.4605201Z ##[group]GITHUB_TOKEN Permissions 2023-01-11T21:14:30.4606167Z Actions: write 2023-01-11T21:14:30.4606561Z Checks: write 2023-01-11T21:14:30.4607004Z Contents: write 2023-01-11T21:14:30.4607495Z Deployments: write 2023-01-11T21:14:30.4607898Z Discussions: write 2023-01-11T21:14:30.4608331Z Issues: write 2023-01-11T21:14:30.4608760Z Metadata: read 2023-01-11T21:14:30.4609148Z Packages: write 2023-01-11T21:14:30.4609613Z Pages: write 2023-01-11T21:14:30.4610092Z PullRequests: write 2023-01-11T21:14:30.4610543Z RepositoryProjects: write 2023-01-11T21:14:30.4611027Z SecurityEvents: write 2023-01-11T21:14:30.4611470Z Statuses: write 2023-01-11T21:14:30.4611889Z ##[endgroup] 2023-01-11T21:14:30.4616620Z Secret source: Actions 2023-01-11T21:14:30.4617582Z Prepare workflow directory 2023-01-11T21:14:30.8206776Z Prepare all required actions 2023-01-11T21:14:30.8443672Z Getting action download info 2023-01-11T21:14:31.0571486Z Download action repository 'pytorch/test-infra@main' (SHA:2c225610d00fb13c04fcd60389d3e4d8326167c3) 2023-01-11T21:14:31.3843623Z Download action repository 'pytorch/pytorch@master' (SHA:c5836153f5332ca83d5cacde38f2829a4d54793e) 2023-01-11T21:14:35.0673869Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2023-01-11T21:14:35.3788905Z Getting action download info 2023-01-11T21:14:35.6323882Z Download action repository 'malfet/checkout@silent-checkout' (SHA:c7b8fef48edfe1bca0044a44b1f7f7c4318a3076) 2023-01-11T21:14:35.8167973Z Getting action download info 2023-01-11T21:14:36.0097434Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482) 2023-01-11T21:14:36.1707086Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml 2023-01-11T21:14:36.1710157Z ##[group] Inputs 2023-01-11T21:14:36.1710606Z build-environment: linux-bionic-cuda11.7-py3.10-gcc7 2023-01-11T21:14:36.1712120Z test-matrix: { include: [ { config: "default", shard: 1, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "default", shard: 2, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "default", shard: 3, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "default", shard: 4, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "functorch", shard: 1, num_shards: 1, runner: "linux.4xlarge.nvidia.gpu" }, { config: "nogpu_AVX512", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, { config: "nogpu_NO_AVX2", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, { config: "jit_legacy", shard: 1, num_shards: 1, runner: "linux.4xlarge.nvidia.gpu" }, { config: "distributed", shard: 1, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, { config: "distributed", shard: 2, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, { config: "distributed", shard: 3, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, ]} 2023-01-11T21:14:36.1713693Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:14:36.1714177Z sync-tag: 2023-01-11T21:14:36.1715737Z timeout-minutes: 240 2023-01-11T21:14:36.1716038Z use-gha: 2023-01-11T21:14:36.1716380Z ##[endgroup] 2023-01-11T21:14:36.1717405Z Complete job name: linux-bionic-cuda11.7-py3.10-gcc7 / test (distributed, 3, 3, linux.8xlarge.nvidia.gpu) 2023-01-11T21:14:36.2845161Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2023-01-11T21:14:36.2845557Z with: 2023-01-11T21:14:36.2846152Z github-secret: *** 2023-01-11T21:14:36.2846620Z instructions: All testing is done inside the container, to start an interactive session run: docker exec -it $(docker container ps --format '{{.ID}}') bash 2023-01-11T21:14:36.2847083Z activate-with-label: false 2023-01-11T21:14:36.2847536Z label: with-ssh 2023-01-11T21:14:36.2847808Z remove-existing-keys: true 2023-01-11T21:14:36.2848060Z env: 2023-01-11T21:14:36.2848310Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:14:36.2848578Z ##[endgroup] 2023-01-11T21:14:36.3931957Z ciflow reference detected, attempting to extract PR number 2023-01-11T21:14:36.8683197Z Grabbing public ssh keys from https://github.com/pytorch-bot[bot].keys 2023-01-11T21:14:36.9625221Z No SSH keys found for user pytorch-bot[bot] 2023-01-11T21:14:36.9625623Z Grabbing public ssh keys from https://github.com/LucaLumetti.keys 2023-01-11T21:14:37.0483859Z ~/.ssh/authorized_keys file found on node, removing ~/.ssh and starting fresh 2023-01-11T21:14:37.0509321Z Public keys pulled and installed to /home/ec2-user/.ssh/authorized_keys 2023-01-11T21:14:37.0565627Z Login using: ssh ec2-user@ec2-54-227-83-11.compute-1.amazonaws.com 2023-01-11T21:14:37.0566742Z All testing is done inside the container, to start an interactive session run: 2023-01-11T21:14:37.0567278Z docker exec -it $(docker container ps --format '{{.ID}}') bash 2023-01-11T21:14:37.0882540Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@master 2023-01-11T21:14:37.0882930Z with: 2023-01-11T21:14:37.0883162Z submodules: recursive 2023-01-11T21:14:37.0883430Z fetch-depth: 0 2023-01-11T21:14:37.0883672Z env: 2023-01-11T21:14:37.0883897Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:14:37.0884166Z ##[endgroup] 2023-01-11T21:14:37.1187727Z ##[group]Run retry () { 2023-01-11T21:14:37.1188055Z retry () { 2023-01-11T21:14:37.1188373Z  $* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*) 2023-01-11T21:14:37.1188675Z } 2023-01-11T21:14:37.1188921Z echo "${GITHUB_WORKSPACE}" 2023-01-11T21:14:37.1189223Z if [ -z "${NO_SUDO}" ]; then 2023-01-11T21:14:37.1189535Z  retry sudo rm -rf "${GITHUB_WORKSPACE}" 2023-01-11T21:14:37.1189805Z else 2023-01-11T21:14:37.1190082Z  retry rm -rf "${GITHUB_WORKSPACE}" 2023-01-11T21:14:37.1190369Z fi 2023-01-11T21:14:37.1190651Z mkdir "${GITHUB_WORKSPACE}" 2023-01-11T21:14:37.1210066Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:14:37.1210370Z env: 2023-01-11T21:14:37.1210624Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:14:37.1210880Z NO_SUDO: 2023-01-11T21:14:37.1211100Z ##[endgroup] 2023-01-11T21:14:37.1339775Z /home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:14:40.1657353Z ##[group]Run malfet/checkout@silent-checkout 2023-01-11T21:14:40.1657650Z with: 2023-01-11T21:14:40.1657930Z ref: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:14:40.1658204Z fetch-depth: 0 2023-01-11T21:14:40.1658454Z submodules: recursive 2023-01-11T21:14:40.1658716Z quiet-checkout: true 2023-01-11T21:14:40.1658976Z repository: pytorch/pytorch 2023-01-11T21:14:40.1659434Z token: *** 2023-01-11T21:14:40.1659684Z ssh-strict: true 2023-01-11T21:14:40.1659949Z persist-credentials: true 2023-01-11T21:14:40.1660208Z clean: true 2023-01-11T21:14:40.1660443Z lfs: false 2023-01-11T21:14:40.1660696Z set-safe-directory: true 2023-01-11T21:14:40.1660926Z env: 2023-01-11T21:14:40.1661161Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:14:40.1661419Z ##[endgroup] 2023-01-11T21:14:40.3178657Z Syncing repository: pytorch/pytorch 2023-01-11T21:14:40.3180510Z ##[group]Getting Git version info 2023-01-11T21:14:40.3181058Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2023-01-11T21:14:40.3181658Z [command]/usr/bin/git version 2023-01-11T21:14:40.3181912Z git version 2.38.1 2023-01-11T21:14:40.3193699Z ##[endgroup] 2023-01-11T21:14:40.3213793Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/5c8bdb8c-9e48-43f8-89ae-b5e91286ca0e' before making global git config changes 2023-01-11T21:14:40.3215523Z Adding repository directory to the temporary git global config as a safe directory 2023-01-11T21:14:40.3221830Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:14:40.3268648Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2023-01-11T21:14:40.3275567Z ##[group]Initializing the repository 2023-01-11T21:14:40.3279651Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:14:40.3313048Z hint: Using 'master' as the name for the initial branch. This default branch name 2023-01-11T21:14:40.3313469Z hint: is subject to change. To configure the initial branch name to use in all 2023-01-11T21:14:40.3313908Z hint: of your new repositories, which will suppress this warning, call: 2023-01-11T21:14:40.3314227Z hint: 2023-01-11T21:14:40.3314581Z hint: git config --global init.defaultBranch 2023-01-11T21:14:40.3315037Z hint: 2023-01-11T21:14:40.3315433Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2023-01-11T21:14:40.3315932Z hint: 'development'. The just-created branch can be renamed via this command: 2023-01-11T21:14:40.3316242Z hint: 2023-01-11T21:14:40.3316681Z hint: git branch -m 2023-01-11T21:14:40.3317207Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2023-01-11T21:14:40.3327860Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2023-01-11T21:14:40.3363640Z ##[endgroup] 2023-01-11T21:14:40.3364135Z ##[group]Disabling automatic garbage collection 2023-01-11T21:14:40.3368492Z [command]/usr/bin/git config --local gc.auto 0 2023-01-11T21:14:40.3400451Z ##[endgroup] 2023-01-11T21:14:40.3401611Z ##[group]Setting up auth 2023-01-11T21:14:40.3411163Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2023-01-11T21:14:40.3446880Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || : 2023-01-11T21:14:40.3750652Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2023-01-11T21:14:40.3785176Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || : 2023-01-11T21:14:40.4088054Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2023-01-11T21:14:40.4141793Z ##[endgroup] 2023-01-11T21:14:40.4142286Z ##[group]Fetching the repository 2023-01-11T21:14:40.4151251Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --quiet --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2023-01-11T21:15:36.7706462Z [command]/usr/bin/git rev-parse --verify --quiet 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e^{object} 2023-01-11T21:15:36.7737229Z 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:15:36.7745268Z ##[endgroup] 2023-01-11T21:15:36.7745834Z ##[group]Determining the checkout info 2023-01-11T21:15:36.7746389Z ##[endgroup] 2023-01-11T21:15:36.7746888Z ##[group]Checking out the ref 2023-01-11T21:15:36.7751233Z [command]/usr/bin/git checkout --quiet --force 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:15:38.5517974Z ##[endgroup] 2023-01-11T21:15:38.5518779Z ##[group]Setting up auth for fetching submodules 2023-01-11T21:15:38.5524895Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2023-01-11T21:15:38.5580073Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2023-01-11T21:15:38.5613717Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2023-01-11T21:15:38.5647021Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2023-01-11T21:15:38.5678894Z ##[endgroup] 2023-01-11T21:15:38.5679545Z ##[group]Fetching submodules 2023-01-11T21:15:38.5684884Z [command]/usr/bin/git submodule sync --recursive 2023-01-11T21:15:38.6009414Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2023-01-11T21:15:38.6314489Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2023-01-11T21:15:38.6316904Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2023-01-11T21:15:38.6319838Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2023-01-11T21:15:38.6322935Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2023-01-11T21:15:38.6326262Z Submodule 'third_party/QNNPACK' (https://github.com/pytorch/QNNPACK) registered for path 'third_party/QNNPACK' 2023-01-11T21:15:38.6329876Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator' 2023-01-11T21:15:38.6333767Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2023-01-11T21:15:38.6337742Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2023-01-11T21:15:38.6341598Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2023-01-11T21:15:38.6345582Z Submodule 'third_party/cub' (https://github.com/NVlabs/cub.git) registered for path 'third_party/cub' 2023-01-11T21:15:38.6349829Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2023-01-11T21:15:38.6353939Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2023-01-11T21:15:38.6358249Z Submodule 'third_party/eigen' (https://gitlab.com/libeigen/eigen.git) registered for path 'third_party/eigen' 2023-01-11T21:15:38.6362964Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2023-01-11T21:15:38.6367449Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2023-01-11T21:15:38.6372424Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2023-01-11T21:15:38.6377917Z Submodule 'third_party/foxi' (https://github.com/houseroad/foxi.git) registered for path 'third_party/foxi' 2023-01-11T21:15:38.6382995Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:15:38.6388058Z Submodule 'third_party/gloo' (https://github.com/facebookincubator/gloo) registered for path 'third_party/gloo' 2023-01-11T21:15:38.6393540Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2023-01-11T21:15:38.6398961Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2023-01-11T21:15:38.6404593Z Submodule 'third_party/ios-cmake' (https://github.com/Yangqing/ios-cmake.git) registered for path 'third_party/ios-cmake' 2023-01-11T21:15:38.6410293Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' 2023-01-11T21:15:38.6416569Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2023-01-11T21:15:38.6422746Z Submodule 'third_party/nccl/nccl' (https://github.com/NVIDIA/nccl) registered for path 'third_party/nccl/nccl' 2023-01-11T21:15:38.6428950Z Submodule 'third_party/neon2sse' (https://github.com/intel/ARM_NEON_2_x86_SSE.git) registered for path 'third_party/neon2sse' 2023-01-11T21:15:38.6435218Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann' 2023-01-11T21:15:38.6441454Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2023-01-11T21:15:38.6448005Z Submodule 'third_party/onnx-tensorrt' (https://github.com/onnx/onnx-tensorrt) registered for path 'third_party/onnx-tensorrt' 2023-01-11T21:15:38.6455088Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2023-01-11T21:15:38.6461920Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2023-01-11T21:15:38.6468715Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2023-01-11T21:15:38.6475795Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2023-01-11T21:15:38.6482871Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2023-01-11T21:15:38.6490248Z Submodule 'third_party/python-enum' (https://github.com/PeachPy/enum34.git) registered for path 'third_party/python-enum' 2023-01-11T21:15:38.6498166Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy' 2023-01-11T21:15:38.6505623Z Submodule 'third_party/python-six' (https://github.com/benjaminp/six.git) registered for path 'third_party/python-six' 2023-01-11T21:15:38.6513189Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2023-01-11T21:15:38.6521043Z Submodule 'third_party/tbb' (https://github.com/01org/tbb) registered for path 'third_party/tbb' 2023-01-11T21:15:38.6529135Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2023-01-11T21:15:38.6538145Z Submodule 'third_party/zstd' (https://github.com/facebook/zstd.git) registered for path 'third_party/zstd' 2023-01-11T21:15:38.6567376Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'... 2023-01-11T21:15:38.9768455Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'... 2023-01-11T21:15:39.2049689Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'... 2023-01-11T21:15:39.4241184Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'... 2023-01-11T21:15:39.7866540Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/QNNPACK'... 2023-01-11T21:15:40.0866621Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'... 2023-01-11T21:15:42.2527286Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'... 2023-01-11T21:15:48.0742176Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'... 2023-01-11T21:15:48.7300451Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'... 2023-01-11T21:15:49.2876013Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cub'... 2023-01-11T21:15:50.8105382Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2023-01-11T21:15:52.1731981Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'... 2023-01-11T21:15:53.5640851Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/eigen'... 2023-01-11T21:16:00.5786216Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'... 2023-01-11T21:16:01.3625572Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'... 2023-01-11T21:16:03.1548246Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'... 2023-01-11T21:16:04.3972425Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/foxi'... 2023-01-11T21:16:05.4009294Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2023-01-11T21:16:05.9069679Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'... 2023-01-11T21:16:06.3137984Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'... 2023-01-11T21:16:07.2830897Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'... 2023-01-11T21:16:07.8204999Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ios-cmake'... 2023-01-11T21:16:08.0535519Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'... 2023-01-11T21:16:08.3588475Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'... 2023-01-11T21:16:09.9680051Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nccl/nccl'... 2023-01-11T21:16:11.4609787Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/neon2sse'... 2023-01-11T21:16:12.0201989Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'... 2023-01-11T21:16:18.0118561Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'... 2023-01-11T21:16:19.7223140Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt'... 2023-01-11T21:16:20.1915655Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'... 2023-01-11T21:16:20.4471905Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'... 2023-01-11T21:16:26.5818744Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'... 2023-01-11T21:16:26.8208803Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'... 2023-01-11T21:16:27.0865622Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'... 2023-01-11T21:16:28.0253811Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-enum'... 2023-01-11T21:16:28.2666625Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'... 2023-01-11T21:16:28.6206443Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-six'... 2023-01-11T21:16:28.9433213Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'... 2023-01-11T21:16:29.5361499Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tbb'... 2023-01-11T21:16:31.7623399Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'... 2023-01-11T21:16:32.2770918Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/zstd'... 2023-01-11T21:16:34.7438744Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2023-01-11T21:16:34.7565312Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2023-01-11T21:16:34.7662983Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2023-01-11T21:16:34.7948498Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2023-01-11T21:16:34.8223017Z Submodule path 'third_party/QNNPACK': checked out '7d2a4e9931a82adc3814275b6219a03e24e36b4c' 2023-01-11T21:16:34.8657583Z Submodule path 'third_party/VulkanMemoryAllocator': checked out 'a6bfc237255a6bac1513f7c1ebde6d8aed6b5191' 2023-01-11T21:16:35.6671443Z Submodule path 'third_party/XNNPACK': checked out 'ae108ef49aa5623b896fc93d4298c49d1750d9ba' 2023-01-11T21:16:35.6922025Z Submodule path 'third_party/benchmark': checked out '0d98dba29d66e93259db7daa53a9327df767a415' 2023-01-11T21:16:35.8148587Z Submodule path 'third_party/cpuinfo': checked out '8ec7bd91ad0470e61cf38f618cc1f270dede599c' 2023-01-11T21:16:35.8555512Z Submodule path 'third_party/cub': checked out 'd106ddb991a56c3df1b6d51b2409e36ba8181ce4' 2023-01-11T21:16:36.2142112Z Submodule path 'third_party/cudnn_frontend': checked out '171a7a986f7fbd9ed71bd0cf3c7ad4f55843d6b3' 2023-01-11T21:16:36.7592276Z Submodule path 'third_party/cutlass': checked out 'b72cbf957df8cf84a6d0ff91c190ad51a9c1d24a' 2023-01-11T21:16:37.0624742Z Submodule path 'third_party/eigen': checked out '3147391d946bb4b6c68edd901f2add6ac1f31f8c' 2023-01-11T21:16:37.1184037Z Submodule path 'third_party/fbgemm': checked out '80d64206c07879fd4683be66873de7cefa1a0a71' 2023-01-11T21:16:37.1202117Z Submodule 'third_party/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:16:37.1205348Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:16:37.1208805Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:16:37.1213276Z Submodule 'third_party/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:16:37.1239972Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/asmjit'... 2023-01-11T21:16:38.2293866Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/cpuinfo'... 2023-01-11T21:16:38.7832508Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/googletest'... 2023-01-11T21:16:39.7538267Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/hipify_torch'... 2023-01-11T21:16:40.1188507Z Submodule path 'third_party/fbgemm/third_party/asmjit': checked out 'd3fbf7c9bc7c1d1365a94a45614b91c5a3706b81' 2023-01-11T21:16:40.2437657Z Submodule path 'third_party/fbgemm/third_party/cpuinfo': checked out 'ed8b86a253800bafdb7b25c5c399f91bff9cb1f3' 2023-01-11T21:16:40.3145254Z Submodule path 'third_party/fbgemm/third_party/googletest': checked out 'cbf019de22c8dd37b2108da35b2748fd702d1796' 2023-01-11T21:16:40.3263768Z Submodule path 'third_party/fbgemm/third_party/hipify_torch': checked out '1840658c184f3eeba787dae0f06c45756c1daaf5' 2023-01-11T21:16:40.4318345Z Submodule path 'third_party/flatbuffers': checked out 'd0cede9c90c5257537c293517a21376408b549fa' 2023-01-11T21:16:40.4760068Z Submodule path 'third_party/fmt': checked out '7bdf0628b1276379886c7f6dda2cef2b3b374f0b' 2023-01-11T21:16:40.4861571Z Submodule path 'third_party/foxi': checked out 'c278588e34e535f0bb8f00df3880d26928038cad' 2023-01-11T21:16:40.5341953Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2023-01-11T21:16:40.5636987Z Submodule path 'third_party/gloo': checked out '4a5e339b764261d20fc409071dc7a8b8989aa195' 2023-01-11T21:16:40.6187936Z Submodule path 'third_party/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2023-01-11T21:16:40.6322022Z Submodule path 'third_party/ideep': checked out 'e533c771a1e75a1c225c14b2261eefa62681d9e6' 2023-01-11T21:16:40.6339271Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2023-01-11T21:16:40.6365710Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2023-01-11T21:16:50.1085346Z Submodule path 'third_party/ideep/mkl-dnn': checked out '404ad76ee633c939d705eb583ffe50a806969d5e' 2023-01-11T21:16:50.1106062Z Submodule 'third_party/oneDNN' (https://github.com/oneapi-src/oneDNN.git) registered for path 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:16:50.1136632Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN'... 2023-01-11T21:16:58.9061707Z Submodule path 'third_party/ideep/mkl-dnn/third_party/oneDNN': checked out 'fbec3e25a559ee252022ae066817b204e106a6ba' 2023-01-11T21:16:58.9180759Z Submodule path 'third_party/ios-cmake': checked out '8abaed637d56f1337d6e1d2c4026e25c1eade724' 2023-01-11T21:16:58.9352630Z Submodule path 'third_party/ittapi': checked out '5b8a7d7422611c3a0d799fb5fc5dd4abfae35b42' 2023-01-11T21:16:59.0478376Z Submodule path 'third_party/kineto': checked out '6c1629809068efd78a8d56b4aa479c7ec49ae562' 2023-01-11T21:16:59.0497030Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:16:59.0500994Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:16:59.0527821Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2023-01-11T21:17:00.2058979Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2023-01-11T21:17:01.2414285Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '2591ab91c3898c9f6544fff04660276537d32ffd' 2023-01-11T21:17:01.3071362Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347' 2023-01-11T21:17:01.3331523Z Submodule path 'third_party/nccl/nccl': checked out 'f89fd4777d2ef9229c039ff750ae21da01626f52' 2023-01-11T21:17:01.3487854Z Submodule path 'third_party/neon2sse': checked out '97a126f08ce318023be604d03f88bf0820a9464a' 2023-01-11T21:17:01.4797930Z Submodule path 'third_party/nlohmann': checked out '87cda1d6646592ac5866dc703c8e1839046a6806' 2023-01-11T21:17:01.8006260Z Submodule path 'third_party/onnx': checked out 'f7ee1ac60d06abe8e26c9b6bbe1e3db5286b614b' 2023-01-11T21:17:01.8038249Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx/third_party/benchmark' 2023-01-11T21:17:01.8041291Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2023-01-11T21:17:01.8069167Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/benchmark'... 2023-01-11T21:17:02.4593148Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2023-01-11T21:17:04.1244005Z Submodule path 'third_party/onnx/third_party/benchmark': checked out '0d98dba29d66e93259db7daa53a9327df767a415' 2023-01-11T21:17:04.1631053Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'ffa346860b306c9bbfb341aed9c14c067751feb8' 2023-01-11T21:17:04.1812262Z Submodule path 'third_party/onnx-tensorrt': checked out 'c153211418a7c57ce071d9ce2a41f8d1c85a878f' 2023-01-11T21:17:04.1830824Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:17:04.1857883Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx'... 2023-01-11T21:17:06.0416653Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx': checked out '765f5ee823a67a866f4bd28a9860e81f3c811ce8' 2023-01-11T21:17:06.0439135Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:17:06.0442263Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:17:06.0470164Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'... 2023-01-11T21:17:06.6311514Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'... 2023-01-11T21:17:07.5543381Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508' 2023-01-11T21:17:07.6340603Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11': checked out 'a1041190c8b8ff0cd9e2f0752248ad5e3789ea0c' 2023-01-11T21:17:07.6358013Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:17:07.6385461Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'... 2023-01-11T21:17:09.5597583Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2023-01-11T21:17:09.5816381Z Submodule path 'third_party/pocketfft': checked out 'ea778e37710c07723435b1be58235996d1d43a5a' 2023-01-11T21:17:09.9086324Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2023-01-11T21:17:09.9109419Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:17:09.9112788Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2023-01-11T21:17:09.9141342Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2023-01-11T21:17:10.3920994Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2023-01-11T21:17:11.4655136Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2023-01-11T21:17:11.5485673Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2023-01-11T21:17:11.5581385Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2023-01-11T21:17:11.5708877Z Submodule path 'third_party/pthreadpool': checked out 'a134dd5d4cee80cce15db81a72e7f929d71dd413' 2023-01-11T21:17:11.6105444Z Submodule path 'third_party/pybind11': checked out '80dc998efced8ceb2be59756668a7e90e8bef917' 2023-01-11T21:17:11.6204724Z Submodule path 'third_party/python-enum': checked out '4cfedc426c4e2fc52e3f5c2b4297e15ed8d6b8c7' 2023-01-11T21:17:11.6532653Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2023-01-11T21:17:11.6635765Z Submodule path 'third_party/python-six': checked out '15e31431af97e5e64b80af0a3f598d382bcdd49a' 2023-01-11T21:17:11.7163801Z Submodule path 'third_party/sleef': checked out 'e0a003ee838b75d11763aa9c3ef17bf71a725bff' 2023-01-11T21:17:11.8564039Z Submodule path 'third_party/tbb': checked out 'a51a90bc609bb73db8ea13841b5cf7aa4344d4a9' 2023-01-11T21:17:11.8885382Z Submodule path 'third_party/tensorpipe': checked out '52791a2fd214b2a9dc5759d36725909c1daa7f2e' 2023-01-11T21:17:11.8903218Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:17:11.8906385Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:17:11.8909554Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:17:11.8912999Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:17:11.8939996Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2023-01-11T21:17:12.9475405Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2023-01-11T21:17:14.3996605Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2023-01-11T21:17:15.6625765Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2023-01-11T21:17:16.6255425Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2023-01-11T21:17:16.6424641Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2023-01-11T21:17:16.7192321Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '1dff88e5161cba5c59276d2070d2e304e4dcb242' 2023-01-11T21:17:16.7519060Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2023-01-11T21:17:16.7536094Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:17:16.7563385Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2023-01-11T21:17:16.9923329Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2023-01-11T21:17:17.1557941Z Submodule path 'third_party/zstd': checked out 'aec56a52fbab207fc639a1937d1e708a282edca8' 2023-01-11T21:17:17.1590056Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2023-01-11T21:17:17.1915029Z Entering 'android/libs/fbjni' 2023-01-11T21:17:17.1958401Z Entering 'third_party/FP16' 2023-01-11T21:17:17.2001599Z Entering 'third_party/FXdiv' 2023-01-11T21:17:17.2045211Z Entering 'third_party/NNPACK' 2023-01-11T21:17:17.2088650Z Entering 'third_party/QNNPACK' 2023-01-11T21:17:17.2132146Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:17:17.2176463Z Entering 'third_party/XNNPACK' 2023-01-11T21:17:17.2230787Z Entering 'third_party/benchmark' 2023-01-11T21:17:17.2275640Z Entering 'third_party/cpuinfo' 2023-01-11T21:17:17.2319677Z Entering 'third_party/cub' 2023-01-11T21:17:17.2363728Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:17:17.2412522Z Entering 'third_party/cutlass' 2023-01-11T21:17:17.2463575Z Entering 'third_party/eigen' 2023-01-11T21:17:17.2509662Z Entering 'third_party/fbgemm' 2023-01-11T21:17:17.2553391Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:17:17.2595652Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:17:17.2638018Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:17:17.2680874Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:17:17.2724265Z Entering 'third_party/flatbuffers' 2023-01-11T21:17:17.2769131Z Entering 'third_party/fmt' 2023-01-11T21:17:17.2811561Z Entering 'third_party/foxi' 2023-01-11T21:17:17.2854423Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:17:17.2898155Z Entering 'third_party/gloo' 2023-01-11T21:17:17.2941863Z Entering 'third_party/googletest' 2023-01-11T21:17:17.2986069Z Entering 'third_party/ideep' 2023-01-11T21:17:17.3028144Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:17:17.3071919Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:17:17.3122456Z Entering 'third_party/ios-cmake' 2023-01-11T21:17:17.3165139Z Entering 'third_party/ittapi' 2023-01-11T21:17:17.3207869Z Entering 'third_party/kineto' 2023-01-11T21:17:17.3250366Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:17:17.3293634Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:17:17.3336822Z Entering 'third_party/nccl/nccl' 2023-01-11T21:17:17.3380717Z Entering 'third_party/neon2sse' 2023-01-11T21:17:17.3422841Z Entering 'third_party/nlohmann' 2023-01-11T21:17:17.3466772Z Entering 'third_party/onnx' 2023-01-11T21:17:17.3522068Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:17:17.3563992Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:17:17.3608174Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:17:17.3649943Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:17:17.3697677Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:17:17.3740062Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:17:17.3781625Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:17:17.3828799Z Entering 'third_party/pocketfft' 2023-01-11T21:17:17.3870872Z Entering 'third_party/protobuf' 2023-01-11T21:17:17.3917950Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:17:17.3959648Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:17:17.4004685Z Entering 'third_party/psimd' 2023-01-11T21:17:17.4047258Z Entering 'third_party/pthreadpool' 2023-01-11T21:17:17.4089116Z Entering 'third_party/pybind11' 2023-01-11T21:17:17.4131546Z Entering 'third_party/python-enum' 2023-01-11T21:17:17.4174080Z Entering 'third_party/python-peachpy' 2023-01-11T21:17:17.4215584Z Entering 'third_party/python-six' 2023-01-11T21:17:17.4258049Z Entering 'third_party/sleef' 2023-01-11T21:17:17.4301220Z Entering 'third_party/tbb' 2023-01-11T21:17:17.4345935Z Entering 'third_party/tensorpipe' 2023-01-11T21:17:17.4388409Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:17:17.4431547Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:17:17.4473418Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:17:17.4516527Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:17:17.4557907Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:17:17.4602346Z Entering 'third_party/zstd' 2023-01-11T21:17:17.4656435Z ##[endgroup] 2023-01-11T21:17:17.4657038Z ##[group]Persisting credentials for submodules 2023-01-11T21:17:17.4664243Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || : 2023-01-11T21:17:17.4979298Z Entering 'android/libs/fbjni' 2023-01-11T21:17:17.5021080Z Entering 'third_party/FP16' 2023-01-11T21:17:17.5063054Z Entering 'third_party/FXdiv' 2023-01-11T21:17:17.5104717Z Entering 'third_party/NNPACK' 2023-01-11T21:17:17.5146688Z Entering 'third_party/QNNPACK' 2023-01-11T21:17:17.5189141Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:17:17.5231265Z Entering 'third_party/XNNPACK' 2023-01-11T21:17:17.5283842Z Entering 'third_party/benchmark' 2023-01-11T21:17:17.5326035Z Entering 'third_party/cpuinfo' 2023-01-11T21:17:17.5369255Z Entering 'third_party/cub' 2023-01-11T21:17:17.5411011Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:17:17.5460027Z Entering 'third_party/cutlass' 2023-01-11T21:17:17.5508960Z Entering 'third_party/eigen' 2023-01-11T21:17:17.5552955Z Entering 'third_party/fbgemm' 2023-01-11T21:17:17.5594317Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:17:17.5635783Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:17:17.5677473Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:17:17.5719265Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:17:17.5762020Z Entering 'third_party/flatbuffers' 2023-01-11T21:17:17.5806015Z Entering 'third_party/fmt' 2023-01-11T21:17:17.5847607Z Entering 'third_party/foxi' 2023-01-11T21:17:17.5890198Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:17:17.5931932Z Entering 'third_party/gloo' 2023-01-11T21:17:17.5974259Z Entering 'third_party/googletest' 2023-01-11T21:17:17.6016187Z Entering 'third_party/ideep' 2023-01-11T21:17:17.6056712Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:17:17.6100144Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:17:17.6149663Z Entering 'third_party/ios-cmake' 2023-01-11T21:17:17.6191810Z Entering 'third_party/ittapi' 2023-01-11T21:17:17.6233728Z Entering 'third_party/kineto' 2023-01-11T21:17:17.6275909Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:17:17.6316973Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:17:17.6360478Z Entering 'third_party/nccl/nccl' 2023-01-11T21:17:17.6402808Z Entering 'third_party/neon2sse' 2023-01-11T21:17:17.6444541Z Entering 'third_party/nlohmann' 2023-01-11T21:17:17.6487897Z Entering 'third_party/onnx' 2023-01-11T21:17:17.6543480Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:17:17.6584847Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:17:17.6628589Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:17:17.6669533Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:17:17.6715290Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:17:17.6756651Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:17:17.6798667Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:17:17.6846721Z Entering 'third_party/pocketfft' 2023-01-11T21:17:17.6888301Z Entering 'third_party/protobuf' 2023-01-11T21:17:17.6933950Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:17:17.6975437Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:17:17.7018504Z Entering 'third_party/psimd' 2023-01-11T21:17:17.7060103Z Entering 'third_party/pthreadpool' 2023-01-11T21:17:17.7102353Z Entering 'third_party/pybind11' 2023-01-11T21:17:17.7145019Z Entering 'third_party/python-enum' 2023-01-11T21:17:17.7186755Z Entering 'third_party/python-peachpy' 2023-01-11T21:17:17.7228406Z Entering 'third_party/python-six' 2023-01-11T21:17:17.7270262Z Entering 'third_party/sleef' 2023-01-11T21:17:17.7312778Z Entering 'third_party/tbb' 2023-01-11T21:17:17.7356706Z Entering 'third_party/tensorpipe' 2023-01-11T21:17:17.7398217Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:17:17.7439535Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:17:17.7480322Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:17:17.7521447Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:17:17.7561875Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:17:17.7606459Z Entering 'third_party/zstd' 2023-01-11T21:17:17.7661923Z [command]/usr/bin/git submodule foreach --recursive git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url 2023-01-11T21:17:17.7980285Z Entering 'android/libs/fbjni' 2023-01-11T21:17:17.8020656Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2023-01-11T21:17:17.8038726Z Entering 'third_party/FP16' 2023-01-11T21:17:17.8079086Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2023-01-11T21:17:17.8096968Z Entering 'third_party/FXdiv' 2023-01-11T21:17:17.8136554Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2023-01-11T21:17:17.8153953Z Entering 'third_party/NNPACK' 2023-01-11T21:17:17.8193792Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2023-01-11T21:17:17.8211441Z Entering 'third_party/QNNPACK' 2023-01-11T21:17:17.8250858Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/QNNPACK/config remote.origin.url 2023-01-11T21:17:17.8268915Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:17:17.8308056Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2023-01-11T21:17:17.8325448Z Entering 'third_party/XNNPACK' 2023-01-11T21:17:17.8364694Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2023-01-11T21:17:17.8394017Z Entering 'third_party/benchmark' 2023-01-11T21:17:17.8434445Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2023-01-11T21:17:17.8452569Z Entering 'third_party/cpuinfo' 2023-01-11T21:17:17.8491660Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2023-01-11T21:17:17.8509792Z Entering 'third_party/cub' 2023-01-11T21:17:17.8549853Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cub/config remote.origin.url 2023-01-11T21:17:17.8567536Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:17:17.8606300Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2023-01-11T21:17:17.8629968Z Entering 'third_party/cutlass' 2023-01-11T21:17:17.8668731Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2023-01-11T21:17:17.8693097Z Entering 'third_party/eigen' 2023-01-11T21:17:17.8731500Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/eigen/config remote.origin.url 2023-01-11T21:17:17.8751427Z Entering 'third_party/fbgemm' 2023-01-11T21:17:17.8791145Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2023-01-11T21:17:17.8808452Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:17:17.8847583Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/asmjit/config remote.origin.url 2023-01-11T21:17:17.8865227Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:17:17.8904017Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/cpuinfo/config remote.origin.url 2023-01-11T21:17:17.8921767Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:17:17.8960136Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/googletest/config remote.origin.url 2023-01-11T21:17:17.8978026Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:17:17.9016816Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/hipify_torch/config remote.origin.url 2023-01-11T21:17:17.9034824Z Entering 'third_party/flatbuffers' 2023-01-11T21:17:17.9073901Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2023-01-11T21:17:17.9093633Z Entering 'third_party/fmt' 2023-01-11T21:17:17.9132240Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2023-01-11T21:17:17.9150036Z Entering 'third_party/foxi' 2023-01-11T21:17:17.9188550Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/foxi/config remote.origin.url 2023-01-11T21:17:17.9206021Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:17:17.9244970Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2023-01-11T21:17:17.9263379Z Entering 'third_party/gloo' 2023-01-11T21:17:17.9302256Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2023-01-11T21:17:17.9319871Z Entering 'third_party/googletest' 2023-01-11T21:17:17.9359164Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2023-01-11T21:17:17.9377370Z Entering 'third_party/ideep' 2023-01-11T21:17:17.9416757Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2023-01-11T21:17:17.9433112Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:17:17.9472031Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2023-01-11T21:17:17.9491463Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:17:17.9530262Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/modules/third_party/oneDNN/config remote.origin.url 2023-01-11T21:17:17.9555677Z Entering 'third_party/ios-cmake' 2023-01-11T21:17:17.9594069Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ios-cmake/config remote.origin.url 2023-01-11T21:17:17.9611016Z Entering 'third_party/ittapi' 2023-01-11T21:17:17.9649748Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2023-01-11T21:17:17.9667187Z Entering 'third_party/kineto' 2023-01-11T21:17:17.9706416Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2023-01-11T21:17:17.9723575Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:17:17.9761890Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2023-01-11T21:17:17.9779573Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:17:17.9819176Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2023-01-11T21:17:17.9837305Z Entering 'third_party/nccl/nccl' 2023-01-11T21:17:17.9879462Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nccl/nccl/config remote.origin.url 2023-01-11T21:17:17.9897347Z Entering 'third_party/neon2sse' 2023-01-11T21:17:17.9936082Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/neon2sse/config remote.origin.url 2023-01-11T21:17:17.9953452Z Entering 'third_party/nlohmann' 2023-01-11T21:17:17.9992255Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2023-01-11T21:17:18.0010926Z Entering 'third_party/onnx' 2023-01-11T21:17:18.0050954Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2023-01-11T21:17:18.0081952Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:17:18.0120935Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/benchmark/config remote.origin.url 2023-01-11T21:17:18.0138690Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:17:18.0178670Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2023-01-11T21:17:18.0197880Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:17:18.0236940Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/config remote.origin.url 2023-01-11T21:17:18.0254945Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:17:18.0293148Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/config remote.origin.url 2023-01-11T21:17:18.0315398Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:17:18.0354375Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/benchmark/config remote.origin.url 2023-01-11T21:17:18.0371654Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:17:18.0410960Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2023-01-11T21:17:18.0428538Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:17:18.0468165Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2023-01-11T21:17:18.0490172Z Entering 'third_party/pocketfft' 2023-01-11T21:17:18.0528835Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2023-01-11T21:17:18.0546634Z Entering 'third_party/protobuf' 2023-01-11T21:17:18.0585814Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2023-01-11T21:17:18.0606507Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:17:18.0645175Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2023-01-11T21:17:18.0663566Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:17:18.0701896Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2023-01-11T21:17:18.0721104Z Entering 'third_party/psimd' 2023-01-11T21:17:18.0759609Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2023-01-11T21:17:18.0777949Z Entering 'third_party/pthreadpool' 2023-01-11T21:17:18.0817224Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2023-01-11T21:17:18.0834487Z Entering 'third_party/pybind11' 2023-01-11T21:17:18.0873149Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2023-01-11T21:17:18.0890800Z Entering 'third_party/python-enum' 2023-01-11T21:17:18.0929515Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-enum/config remote.origin.url 2023-01-11T21:17:18.0946908Z Entering 'third_party/python-peachpy' 2023-01-11T21:17:18.0985259Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2023-01-11T21:17:18.1002653Z Entering 'third_party/python-six' 2023-01-11T21:17:18.1041381Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-six/config remote.origin.url 2023-01-11T21:17:18.1059228Z Entering 'third_party/sleef' 2023-01-11T21:17:18.1098146Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2023-01-11T21:17:18.1115636Z Entering 'third_party/tbb' 2023-01-11T21:17:18.1154464Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tbb/config remote.origin.url 2023-01-11T21:17:18.1174293Z Entering 'third_party/tensorpipe' 2023-01-11T21:17:18.1213118Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2023-01-11T21:17:18.1230848Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:17:18.1269511Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2023-01-11T21:17:18.1286494Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:17:18.1324786Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2023-01-11T21:17:18.1341983Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:17:18.1380170Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2023-01-11T21:17:18.1397503Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:17:18.1436183Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2023-01-11T21:17:18.1452525Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:17:18.1491501Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2023-01-11T21:17:18.1511442Z Entering 'third_party/zstd' 2023-01-11T21:17:18.1551702Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/zstd/config remote.origin.url 2023-01-11T21:17:18.2540346Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2023-01-11T21:17:18.2853032Z Entering 'android/libs/fbjni' 2023-01-11T21:17:18.2896966Z Entering 'third_party/FP16' 2023-01-11T21:17:18.2940659Z Entering 'third_party/FXdiv' 2023-01-11T21:17:18.2985664Z Entering 'third_party/NNPACK' 2023-01-11T21:17:18.3027990Z Entering 'third_party/QNNPACK' 2023-01-11T21:17:18.3070940Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:17:18.3116493Z Entering 'third_party/XNNPACK' 2023-01-11T21:17:18.3170227Z Entering 'third_party/benchmark' 2023-01-11T21:17:18.3212978Z Entering 'third_party/cpuinfo' 2023-01-11T21:17:18.3256881Z Entering 'third_party/cub' 2023-01-11T21:17:18.3298650Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:17:18.3347627Z Entering 'third_party/cutlass' 2023-01-11T21:17:18.3397314Z Entering 'third_party/eigen' 2023-01-11T21:17:18.3442734Z Entering 'third_party/fbgemm' 2023-01-11T21:17:18.3485756Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:17:18.3529892Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:17:18.3572502Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:17:18.3615716Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:17:18.3660685Z Entering 'third_party/flatbuffers' 2023-01-11T21:17:18.3709051Z Entering 'third_party/fmt' 2023-01-11T21:17:18.3753383Z Entering 'third_party/foxi' 2023-01-11T21:17:18.3797874Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:17:18.3842531Z Entering 'third_party/gloo' 2023-01-11T21:17:18.3885562Z Entering 'third_party/googletest' 2023-01-11T21:17:18.3930185Z Entering 'third_party/ideep' 2023-01-11T21:17:18.3972100Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:17:18.4017010Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:17:18.4066906Z Entering 'third_party/ios-cmake' 2023-01-11T21:17:18.4109396Z Entering 'third_party/ittapi' 2023-01-11T21:17:18.4151743Z Entering 'third_party/kineto' 2023-01-11T21:17:18.4194485Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:17:18.4237453Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:17:18.4282758Z Entering 'third_party/nccl/nccl' 2023-01-11T21:17:18.4325414Z Entering 'third_party/neon2sse' 2023-01-11T21:17:18.4368170Z Entering 'third_party/nlohmann' 2023-01-11T21:17:18.4411875Z Entering 'third_party/onnx' 2023-01-11T21:17:18.4468926Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:17:18.4512171Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:17:18.4556683Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:17:18.4598186Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:17:18.4645660Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:17:18.4688975Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:17:18.4731301Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:17:18.4779693Z Entering 'third_party/pocketfft' 2023-01-11T21:17:18.4821886Z Entering 'third_party/protobuf' 2023-01-11T21:17:18.4868732Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:17:18.4910758Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:17:18.4954540Z Entering 'third_party/psimd' 2023-01-11T21:17:18.4997211Z Entering 'third_party/pthreadpool' 2023-01-11T21:17:18.5040417Z Entering 'third_party/pybind11' 2023-01-11T21:17:18.5083479Z Entering 'third_party/python-enum' 2023-01-11T21:17:18.5128158Z Entering 'third_party/python-peachpy' 2023-01-11T21:17:18.5172496Z Entering 'third_party/python-six' 2023-01-11T21:17:18.5216264Z Entering 'third_party/sleef' 2023-01-11T21:17:18.5259051Z Entering 'third_party/tbb' 2023-01-11T21:17:18.5303734Z Entering 'third_party/tensorpipe' 2023-01-11T21:17:18.5347530Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:17:18.5390976Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:17:18.5433751Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:17:18.5476314Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:17:18.5518350Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:17:18.5563323Z Entering 'third_party/zstd' 2023-01-11T21:17:18.5620519Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2023-01-11T21:17:18.5932509Z Entering 'android/libs/fbjni' 2023-01-11T21:17:18.5976142Z Entering 'third_party/FP16' 2023-01-11T21:17:18.6018706Z Entering 'third_party/FXdiv' 2023-01-11T21:17:18.6062203Z Entering 'third_party/NNPACK' 2023-01-11T21:17:18.6104568Z Entering 'third_party/QNNPACK' 2023-01-11T21:17:18.6147769Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T21:17:18.6191206Z Entering 'third_party/XNNPACK' 2023-01-11T21:17:18.6245038Z Entering 'third_party/benchmark' 2023-01-11T21:17:18.6287982Z Entering 'third_party/cpuinfo' 2023-01-11T21:17:18.6331642Z Entering 'third_party/cub' 2023-01-11T21:17:18.6374922Z Entering 'third_party/cudnn_frontend' 2023-01-11T21:17:18.6425184Z Entering 'third_party/cutlass' 2023-01-11T21:17:18.6474707Z Entering 'third_party/eigen' 2023-01-11T21:17:18.6520288Z Entering 'third_party/fbgemm' 2023-01-11T21:17:18.6563597Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T21:17:18.6605681Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T21:17:18.6649158Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T21:17:18.6691556Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T21:17:18.6736032Z Entering 'third_party/flatbuffers' 2023-01-11T21:17:18.6780860Z Entering 'third_party/fmt' 2023-01-11T21:17:18.6823656Z Entering 'third_party/foxi' 2023-01-11T21:17:18.6866151Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T21:17:18.6908425Z Entering 'third_party/gloo' 2023-01-11T21:17:18.6950887Z Entering 'third_party/googletest' 2023-01-11T21:17:18.6994397Z Entering 'third_party/ideep' 2023-01-11T21:17:18.7036351Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T21:17:18.7083904Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T21:17:18.7137170Z Entering 'third_party/ios-cmake' 2023-01-11T21:17:18.7180723Z Entering 'third_party/ittapi' 2023-01-11T21:17:18.7224714Z Entering 'third_party/kineto' 2023-01-11T21:17:18.7267602Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T21:17:18.7310242Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T21:17:18.7354751Z Entering 'third_party/nccl/nccl' 2023-01-11T21:17:18.7397588Z Entering 'third_party/neon2sse' 2023-01-11T21:17:18.7440006Z Entering 'third_party/nlohmann' 2023-01-11T21:17:18.7484088Z Entering 'third_party/onnx' 2023-01-11T21:17:18.7539919Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T21:17:18.7583153Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T21:17:18.7628911Z Entering 'third_party/onnx-tensorrt' 2023-01-11T21:17:18.7671962Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T21:17:18.7719371Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T21:17:18.7762652Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T21:17:18.7805736Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T21:17:18.7854367Z Entering 'third_party/pocketfft' 2023-01-11T21:17:18.7896165Z Entering 'third_party/protobuf' 2023-01-11T21:17:18.7942216Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T21:17:18.7984764Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T21:17:18.8029628Z Entering 'third_party/psimd' 2023-01-11T21:17:18.8072134Z Entering 'third_party/pthreadpool' 2023-01-11T21:17:18.8114913Z Entering 'third_party/pybind11' 2023-01-11T21:17:18.8158086Z Entering 'third_party/python-enum' 2023-01-11T21:17:18.8200698Z Entering 'third_party/python-peachpy' 2023-01-11T21:17:18.8244103Z Entering 'third_party/python-six' 2023-01-11T21:17:18.8286705Z Entering 'third_party/sleef' 2023-01-11T21:17:18.8329907Z Entering 'third_party/tbb' 2023-01-11T21:17:18.8376511Z Entering 'third_party/tensorpipe' 2023-01-11T21:17:18.8419435Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T21:17:18.8461174Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T21:17:18.8503135Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T21:17:18.8545951Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T21:17:18.8588290Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T21:17:18.8641014Z Entering 'third_party/zstd' 2023-01-11T21:17:18.8688322Z ##[endgroup] 2023-01-11T21:17:18.8733440Z [command]/usr/bin/git log -1 --format='%H' 2023-01-11T21:17:18.8762266Z '8419ddda87c8a47eacc63b54bc7ec98c1f27c26e' 2023-01-11T21:17:18.8911904Z Prepare all required actions 2023-01-11T21:17:18.8944540Z ##[group]Run ./.github/actions/setup-linux 2023-01-11T21:17:18.8944820Z env: 2023-01-11T21:17:18.8945045Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:17:18.8945304Z ##[endgroup] 2023-01-11T21:17:18.8963231Z ##[group]Run set -euo pipefail 2023-01-11T21:17:18.8963525Z set -euo pipefail 2023-01-11T21:17:18.8963807Z function get_ec2_metadata() { 2023-01-11T21:17:18.8964143Z  # Pulled from instance metadata endpoint for EC2 2023-01-11T21:17:18.8964607Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2023-01-11T21:17:18.8964988Z  category=$1 2023-01-11T21:17:18.8965305Z  curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2023-01-11T21:17:18.8965613Z } 2023-01-11T21:17:18.8965869Z echo "ami-id: $(get_ec2_metadata ami-id)" 2023-01-11T21:17:18.8966248Z echo "instance-id: $(get_ec2_metadata instance-id)" 2023-01-11T21:17:18.8966621Z echo "instance-type: $(get_ec2_metadata instance-type)" 2023-01-11T21:17:18.8966963Z echo "system info $(uname -a)" 2023-01-11T21:17:18.8981277Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:17:18.8981560Z env: 2023-01-11T21:17:18.8981799Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:17:18.8982059Z ##[endgroup] 2023-01-11T21:17:18.9082467Z ami-id: ami-096198a0bccc6bad4 2023-01-11T21:17:18.9146330Z instance-id: i-0a2cfe12f2970a977 2023-01-11T21:17:18.9209139Z instance-type: g3.8xlarge 2023-01-11T21:17:18.9218013Z system info Linux ip-10-0-0-121.ec2.internal 4.14.252-195.483.amzn2.x86_64 #1 SMP Mon Nov 1 20:58:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux 2023-01-11T21:17:18.9236146Z ##[group]Run if systemctl is-active --quiet docker; then 2023-01-11T21:17:18.9236529Z if systemctl is-active --quiet docker; then 2023-01-11T21:17:18.9236865Z  echo "Docker daemon is running..."; 2023-01-11T21:17:18.9237139Z else 2023-01-11T21:17:18.9237458Z  echo "Starting docker deamon..." && sudo systemctl start docker; 2023-01-11T21:17:18.9237768Z fi 2023-01-11T21:17:18.9249941Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:17:18.9250239Z env: 2023-01-11T21:17:18.9250481Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:17:18.9250724Z ##[endgroup] 2023-01-11T21:17:18.9300819Z Docker daemon is running... 2023-01-11T21:17:18.9320307Z ##[group]Run AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2023-01-11T21:17:18.9320820Z AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2023-01-11T21:17:18.9321242Z retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2023-01-11T21:17:18.9321782Z retry aws ecr get-login*** "$AWS_DEFAULT_REGION" | docker login --username AWS \ 2023-01-11T21:17:18.9322285Z  --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" 2023-01-11T21:17:18.9335195Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:17:18.9335542Z env: 2023-01-11T21:17:18.9335816Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:17:18.9336134Z AWS_RETRY_MODE: standard 2023-01-11T21:17:18.9336445Z AWS_MAX_ATTEMPTS: 5 2023-01-11T21:17:18.9336766Z AWS_DEFAULT_REGION: us-east-1 2023-01-11T21:17:18.9337053Z ##[endgroup] 2023-01-11T21:17:19.8744331Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2023-01-11T21:17:19.8744999Z Configure a credential helper to remove this warning. See 2023-01-11T21:17:19.8746738Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2023-01-11T21:17:19.8747260Z Login Succeeded 2023-01-11T21:17:19.8747445Z 2023-01-11T21:17:19.8781611Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2023-01-11T21:17:19.8782022Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2023-01-11T21:17:19.8782533Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2023-01-11T21:17:19.8795952Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:17:19.8796257Z env: 2023-01-11T21:17:19.8796504Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:17:19.8796753Z ##[endgroup] 2023-01-11T21:17:19.8888892Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2023-01-11T21:17:19.8889304Z with: 2023-01-11T21:17:19.8889819Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:17:19.8890332Z env: 2023-01-11T21:17:19.8890615Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:17:19.8890897Z ##[endgroup] 2023-01-11T21:17:19.8909022Z ##[group]Run retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2023-01-11T21:17:19.8909439Z retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2023-01-11T21:17:19.8909845Z # ignore output since only exit code is used for conditional 2023-01-11T21:17:19.8910279Z # only pull docker image if it's not available locally 2023-01-11T21:17:19.8910707Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2023-01-11T21:17:19.8911166Z  retry docker pull "${DOCKER_IMAGE}" 2023-01-11T21:17:19.8911469Z fi 2023-01-11T21:17:19.8923957Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:17:19.8924284Z env: 2023-01-11T21:17:19.8924573Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:17:19.8925141Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:17:19.8925667Z ##[endgroup] 2023-01-11T21:17:20.1394428Z fd224c2e6c79d7fdec6408da598bf52bc5b201dd: Pulling from pytorch/pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7 2023-01-11T21:17:20.1395371Z fb668870d8a7: Pulling fs layer 2023-01-11T21:17:20.1396023Z 4542784317be: Pulling fs layer 2023-01-11T21:17:20.1396716Z e0bec5df5af5: Pulling fs layer 2023-01-11T21:17:20.1397360Z 4053f75740ab: Pulling fs layer 2023-01-11T21:17:20.1398005Z 57e09105cdfd: Pulling fs layer 2023-01-11T21:17:20.1400675Z 606761d225e5: Pulling fs layer 2023-01-11T21:17:20.1401406Z 69473a703fb4: Pulling fs layer 2023-01-11T21:17:20.1402076Z a08ab4e0594b: Pulling fs layer 2023-01-11T21:17:20.1402782Z 4cd507bccac2: Pulling fs layer 2023-01-11T21:17:20.1403439Z 606761d225e5: Waiting 2023-01-11T21:17:20.1404070Z fa92f16621a4: Pulling fs layer 2023-01-11T21:17:20.1404868Z a08ab4e0594b: Waiting 2023-01-11T21:17:20.1405494Z 57e09105cdfd: Waiting 2023-01-11T21:17:20.1406103Z 4cd507bccac2: Waiting 2023-01-11T21:17:20.1406759Z 6dc2b05bd224: Pulling fs layer 2023-01-11T21:17:20.1407423Z ce4a87d45645: Pulling fs layer 2023-01-11T21:17:20.1408053Z 41860ea59b6c: Pulling fs layer 2023-01-11T21:17:20.1408687Z 69473a703fb4: Waiting 2023-01-11T21:17:20.1409310Z 6dc2b05bd224: Waiting 2023-01-11T21:17:20.1409934Z 87d0ffa55850: Pulling fs layer 2023-01-11T21:17:20.1411111Z f9f75aaba8d7: Pulling fs layer 2023-01-11T21:17:20.1411792Z 0c06be5c20e0: Pulling fs layer 2023-01-11T21:17:20.1412440Z d23c0a07b67c: Pulling fs layer 2023-01-11T21:17:20.1413728Z 41860ea59b6c: Waiting 2023-01-11T21:17:20.1414348Z 0c06be5c20e0: Waiting 2023-01-11T21:17:20.1414953Z ce4a87d45645: Waiting 2023-01-11T21:17:20.1415592Z 1001f0d2f3d0: Pulling fs layer 2023-01-11T21:17:20.1416257Z e1c655e7ec0e: Pulling fs layer 2023-01-11T21:17:20.1416837Z fa92f16621a4: Waiting 2023-01-11T21:17:20.1417466Z a11b4b5fd784: Pulling fs layer 2023-01-11T21:17:20.1418138Z bc41eab7f454: Pulling fs layer 2023-01-11T21:17:20.1418742Z 1001f0d2f3d0: Waiting 2023-01-11T21:17:20.1419308Z a11b4b5fd784: Waiting 2023-01-11T21:17:20.1419915Z e1c655e7ec0e: Waiting 2023-01-11T21:17:20.1420550Z b8f759fd0191: Pulling fs layer 2023-01-11T21:17:20.1421120Z 87d0ffa55850: Waiting 2023-01-11T21:17:20.1421761Z f410dcc9d0be: Pulling fs layer 2023-01-11T21:17:20.1422428Z 90d8f9bbe048: Pulling fs layer 2023-01-11T21:17:20.1423081Z eedfbaa04e4f: Pulling fs layer 2023-01-11T21:17:20.1423729Z bc41eab7f454: Waiting 2023-01-11T21:17:20.1424344Z 90d8f9bbe048: Waiting 2023-01-11T21:17:20.1424944Z f410dcc9d0be: Waiting 2023-01-11T21:17:20.1425619Z 2f2308643d60: Pulling fs layer 2023-01-11T21:17:20.1426705Z c1a92fad2c2c: Pulling fs layer 2023-01-11T21:17:20.1427372Z 47037a50f270: Pulling fs layer 2023-01-11T21:17:20.1428025Z 1a2fd7b216d7: Pulling fs layer 2023-01-11T21:17:20.1428663Z 4053f75740ab: Waiting 2023-01-11T21:17:20.1429216Z 765839304d2e: Pulling fs layer 2023-01-11T21:17:20.1429844Z f9f75aaba8d7: Waiting 2023-01-11T21:17:20.1430483Z e51794baeb92: Pulling fs layer 2023-01-11T21:17:20.1431076Z c1a92fad2c2c: Waiting 2023-01-11T21:17:20.1431654Z ea4bfeaa0fc7: Pulling fs layer 2023-01-11T21:17:20.1432304Z 765839304d2e: Waiting 2023-01-11T21:17:20.1432935Z d8065d17513d: Pulling fs layer 2023-01-11T21:17:20.1433517Z ea4bfeaa0fc7: Waiting 2023-01-11T21:17:20.1434238Z 6d83ca3dedf3: Pulling fs layer 2023-01-11T21:17:20.1434910Z 12ddc57b99eb: Pulling fs layer 2023-01-11T21:17:20.1435484Z b590670d273c: Pulling fs layer 2023-01-11T21:17:20.1436124Z 8afbc57dfec9: Pulling fs layer 2023-01-11T21:17:20.1436779Z 29a7c0d5fa4c: Pulling fs layer 2023-01-11T21:17:20.1437391Z d8065d17513d: Waiting 2023-01-11T21:17:20.1438016Z 16825bb02017: Pulling fs layer 2023-01-11T21:17:20.1438643Z 6d83ca3dedf3: Waiting 2023-01-11T21:17:20.1439250Z eedfbaa04e4f: Waiting 2023-01-11T21:17:20.1439861Z b590670d273c: Waiting 2023-01-11T21:17:20.1440489Z 8afbc57dfec9: Waiting 2023-01-11T21:17:20.1441097Z bdf297d7f88c: Pulling fs layer 2023-01-11T21:17:20.1441737Z 12ddc57b99eb: Waiting 2023-01-11T21:17:20.1442351Z 16825bb02017: Waiting 2023-01-11T21:17:20.1442939Z 29a7c0d5fa4c: Waiting 2023-01-11T21:17:20.1443578Z 885c12efa4ae: Pulling fs layer 2023-01-11T21:17:20.1444214Z bdf297d7f88c: Waiting 2023-01-11T21:17:20.1444821Z 28c5689cb975: Pulling fs layer 2023-01-11T21:17:20.1445467Z cca768f96df4: Pulling fs layer 2023-01-11T21:17:20.1446107Z 904b81494b5e: Pulling fs layer 2023-01-11T21:17:20.1446735Z 61eecfa8b34e: Pulling fs layer 2023-01-11T21:17:20.1447377Z 885c12efa4ae: Waiting 2023-01-11T21:17:20.1447979Z 28c5689cb975: Waiting 2023-01-11T21:17:20.1448577Z 95c1ac011645: Pulling fs layer 2023-01-11T21:17:20.1449228Z 07cee023724c: Pulling fs layer 2023-01-11T21:17:20.1449947Z cca768f96df4: Waiting 2023-01-11T21:17:20.1450505Z 904b81494b5e: Waiting 2023-01-11T21:17:20.1451128Z 195d560d8cf6: Pulling fs layer 2023-01-11T21:17:20.1451764Z 61eecfa8b34e: Waiting 2023-01-11T21:17:20.1452335Z a399389c7f8e: Pulling fs layer 2023-01-11T21:17:20.1453310Z 95c1ac011645: Waiting 2023-01-11T21:17:20.1453937Z a399389c7f8e: Waiting 2023-01-11T21:17:20.1454558Z 7447f84b33ef: Pulling fs layer 2023-01-11T21:17:20.1455181Z 7447f84b33ef: Waiting 2023-01-11T21:17:20.1455804Z 2f2308643d60: Waiting 2023-01-11T21:17:20.1456413Z 0d8aeb1421f9: Pulling fs layer 2023-01-11T21:17:20.1457057Z 02048a597c22: Pulling fs layer 2023-01-11T21:17:20.1457680Z 25d615d8a5e2: Pulling fs layer 2023-01-11T21:17:20.1458621Z 09d400b86049: Pulling fs layer 2023-01-11T21:17:20.1459198Z 0d8aeb1421f9: Waiting 2023-01-11T21:17:20.1459793Z 02048a597c22: Waiting 2023-01-11T21:17:20.3438937Z 4542784317be: Verifying Checksum 2023-01-11T21:17:20.3439448Z 4542784317be: Download complete 2023-01-11T21:17:20.4279726Z 4053f75740ab: Verifying Checksum 2023-01-11T21:17:20.4280096Z 4053f75740ab: Download complete 2023-01-11T21:17:20.4635435Z fb668870d8a7: Verifying Checksum 2023-01-11T21:17:20.4636000Z fb668870d8a7: Download complete 2023-01-11T21:17:20.5137198Z 57e09105cdfd: Verifying Checksum 2023-01-11T21:17:20.5137557Z 57e09105cdfd: Download complete 2023-01-11T21:17:20.6457847Z 69473a703fb4: Verifying Checksum 2023-01-11T21:17:20.6458243Z 69473a703fb4: Download complete 2023-01-11T21:17:20.6814624Z e0bec5df5af5: Verifying Checksum 2023-01-11T21:17:20.6815000Z e0bec5df5af5: Download complete 2023-01-11T21:17:20.7607249Z 4cd507bccac2: Verifying Checksum 2023-01-11T21:17:20.7607980Z 4cd507bccac2: Download complete 2023-01-11T21:17:20.7687752Z a08ab4e0594b: Verifying Checksum 2023-01-11T21:17:20.7688413Z a08ab4e0594b: Download complete 2023-01-11T21:17:20.8573226Z 6dc2b05bd224: Verifying Checksum 2023-01-11T21:17:20.8573562Z 6dc2b05bd224: Download complete 2023-01-11T21:17:20.9632259Z ce4a87d45645: Verifying Checksum 2023-01-11T21:17:20.9632872Z ce4a87d45645: Download complete 2023-01-11T21:17:21.2000647Z fb668870d8a7: Pull complete 2023-01-11T21:17:21.5878686Z 4542784317be: Pull complete 2023-01-11T21:17:23.0049248Z 41860ea59b6c: Verifying Checksum 2023-01-11T21:17:23.0049760Z 41860ea59b6c: Download complete 2023-01-11T21:17:23.1119847Z 87d0ffa55850: Verifying Checksum 2023-01-11T21:17:23.1120181Z 87d0ffa55850: Download complete 2023-01-11T21:17:23.1864966Z f9f75aaba8d7: Verifying Checksum 2023-01-11T21:17:23.1865301Z f9f75aaba8d7: Download complete 2023-01-11T21:17:23.2679631Z 0c06be5c20e0: Verifying Checksum 2023-01-11T21:17:23.2679964Z 0c06be5c20e0: Download complete 2023-01-11T21:17:23.3131011Z e0bec5df5af5: Pull complete 2023-01-11T21:17:23.4355860Z 4053f75740ab: Pull complete 2023-01-11T21:17:23.5602619Z 57e09105cdfd: Pull complete 2023-01-11T21:17:24.2412328Z d23c0a07b67c: Verifying Checksum 2023-01-11T21:17:24.2412672Z d23c0a07b67c: Download complete 2023-01-11T21:17:24.3052257Z 1001f0d2f3d0: Verifying Checksum 2023-01-11T21:17:24.3052611Z 1001f0d2f3d0: Download complete 2023-01-11T21:17:24.3789892Z e1c655e7ec0e: Download complete 2023-01-11T21:17:31.4025637Z 606761d225e5: Verifying Checksum 2023-01-11T21:17:31.4025982Z 606761d225e5: Download complete 2023-01-11T21:17:31.4771247Z bc41eab7f454: Verifying Checksum 2023-01-11T21:17:31.4771587Z bc41eab7f454: Download complete 2023-01-11T21:17:31.5523742Z b8f759fd0191: Download complete 2023-01-11T21:17:31.6215613Z f410dcc9d0be: Verifying Checksum 2023-01-11T21:17:31.6215941Z f410dcc9d0be: Download complete 2023-01-11T21:17:31.7015363Z 90d8f9bbe048: Verifying Checksum 2023-01-11T21:17:31.7015710Z 90d8f9bbe048: Download complete 2023-01-11T21:17:31.7867001Z eedfbaa04e4f: Verifying Checksum 2023-01-11T21:17:31.7867352Z eedfbaa04e4f: Download complete 2023-01-11T21:17:31.8756239Z 2f2308643d60: Verifying Checksum 2023-01-11T21:17:31.8756585Z 2f2308643d60: Download complete 2023-01-11T21:17:32.8200326Z c1a92fad2c2c: Verifying Checksum 2023-01-11T21:17:32.8200732Z c1a92fad2c2c: Download complete 2023-01-11T21:17:32.9912142Z 47037a50f270: Verifying Checksum 2023-01-11T21:17:32.9912528Z 47037a50f270: Download complete 2023-01-11T21:17:33.0723653Z 1a2fd7b216d7: Verifying Checksum 2023-01-11T21:17:33.0723987Z 1a2fd7b216d7: Download complete 2023-01-11T21:17:33.1731516Z 765839304d2e: Verifying Checksum 2023-01-11T21:17:33.1731843Z 765839304d2e: Download complete 2023-01-11T21:17:33.2467037Z e51794baeb92: Verifying Checksum 2023-01-11T21:17:33.2467376Z e51794baeb92: Download complete 2023-01-11T21:17:33.3383881Z ea4bfeaa0fc7: Verifying Checksum 2023-01-11T21:17:33.3384238Z ea4bfeaa0fc7: Download complete 2023-01-11T21:17:34.9697323Z fa92f16621a4: Verifying Checksum 2023-01-11T21:17:35.0473476Z 6d83ca3dedf3: Verifying Checksum 2023-01-11T21:17:35.0474175Z 6d83ca3dedf3: Download complete 2023-01-11T21:17:35.1423343Z 12ddc57b99eb: Verifying Checksum 2023-01-11T21:17:35.1424035Z 12ddc57b99eb: Download complete 2023-01-11T21:17:35.4123373Z d8065d17513d: Verifying Checksum 2023-01-11T21:17:35.4124274Z d8065d17513d: Download complete 2023-01-11T21:17:35.4932214Z 8afbc57dfec9: Verifying Checksum 2023-01-11T21:17:35.4933132Z 8afbc57dfec9: Download complete 2023-01-11T21:17:35.5819702Z 29a7c0d5fa4c: Verifying Checksum 2023-01-11T21:17:35.5820475Z 29a7c0d5fa4c: Download complete 2023-01-11T21:17:35.6585862Z b590670d273c: Verifying Checksum 2023-01-11T21:17:35.6586561Z b590670d273c: Download complete 2023-01-11T21:17:35.7298649Z bdf297d7f88c: Verifying Checksum 2023-01-11T21:17:35.7299360Z bdf297d7f88c: Download complete 2023-01-11T21:17:35.8602234Z 16825bb02017: Verifying Checksum 2023-01-11T21:17:35.8602886Z 16825bb02017: Download complete 2023-01-11T21:17:35.9830892Z 28c5689cb975: Verifying Checksum 2023-01-11T21:17:35.9831672Z 28c5689cb975: Download complete 2023-01-11T21:17:36.0883491Z cca768f96df4: Verifying Checksum 2023-01-11T21:17:36.0884218Z cca768f96df4: Download complete 2023-01-11T21:17:36.2172385Z 885c12efa4ae: Verifying Checksum 2023-01-11T21:17:36.2173370Z 885c12efa4ae: Download complete 2023-01-11T21:17:36.2907515Z 61eecfa8b34e: Verifying Checksum 2023-01-11T21:17:36.2907862Z 61eecfa8b34e: Download complete 2023-01-11T21:17:36.3723409Z 95c1ac011645: Verifying Checksum 2023-01-11T21:17:36.3724037Z 95c1ac011645: Download complete 2023-01-11T21:17:36.5131985Z 07cee023724c: Download complete 2023-01-11T21:17:36.5950959Z 195d560d8cf6: Verifying Checksum 2023-01-11T21:17:36.5951539Z 195d560d8cf6: Download complete 2023-01-11T21:17:36.7878161Z a399389c7f8e: Verifying Checksum 2023-01-11T21:17:36.7878723Z a399389c7f8e: Download complete 2023-01-11T21:17:36.8712252Z 7447f84b33ef: Verifying Checksum 2023-01-11T21:17:36.8712928Z 7447f84b33ef: Download complete 2023-01-11T21:17:37.4722845Z 0d8aeb1421f9: Verifying Checksum 2023-01-11T21:17:37.4723230Z 0d8aeb1421f9: Download complete 2023-01-11T21:17:37.5602094Z 02048a597c22: Verifying Checksum 2023-01-11T21:17:37.5602482Z 02048a597c22: Download complete 2023-01-11T21:17:38.8637789Z 904b81494b5e: Verifying Checksum 2023-01-11T21:17:38.8638343Z 904b81494b5e: Download complete 2023-01-11T21:17:38.9403439Z 09d400b86049: Verifying Checksum 2023-01-11T21:17:38.9403774Z 09d400b86049: Download complete 2023-01-11T21:17:44.7942304Z 606761d225e5: Pull complete 2023-01-11T21:17:44.9165819Z 69473a703fb4: Pull complete 2023-01-11T21:17:45.0194794Z a08ab4e0594b: Pull complete 2023-01-11T21:17:45.1427048Z 4cd507bccac2: Pull complete 2023-01-11T21:18:06.5802785Z fa92f16621a4: Pull complete 2023-01-11T21:18:08.4591943Z 6dc2b05bd224: Pull complete 2023-01-11T21:18:10.3695323Z ce4a87d45645: Pull complete 2023-01-11T21:18:18.1112320Z 41860ea59b6c: Pull complete 2023-01-11T21:18:19.9583086Z 87d0ffa55850: Pull complete 2023-01-11T21:18:21.9626003Z f9f75aaba8d7: Pull complete 2023-01-11T21:18:24.8777990Z 0c06be5c20e0: Pull complete 2023-01-11T21:18:25.1940064Z a11b4b5fd784: Verifying Checksum 2023-01-11T21:18:25.1940436Z a11b4b5fd784: Download complete 2023-01-11T21:18:29.2929012Z d23c0a07b67c: Pull complete 2023-01-11T21:18:31.1717675Z 1001f0d2f3d0: Pull complete 2023-01-11T21:18:33.1730865Z e1c655e7ec0e: Pull complete 2023-01-11T21:18:56.5608965Z 25d615d8a5e2: Verifying Checksum 2023-01-11T21:18:56.5611370Z 25d615d8a5e2: Download complete 2023-01-11T21:19:05.4478357Z a11b4b5fd784: Pull complete 2023-01-11T21:19:07.3183216Z bc41eab7f454: Pull complete 2023-01-11T21:19:09.2234622Z b8f759fd0191: Pull complete 2023-01-11T21:19:11.2208517Z f410dcc9d0be: Pull complete 2023-01-11T21:19:13.2089446Z 90d8f9bbe048: Pull complete 2023-01-11T21:19:15.5640153Z eedfbaa04e4f: Pull complete 2023-01-11T21:19:16.9972405Z 2f2308643d60: Pull complete 2023-01-11T21:19:21.1132041Z c1a92fad2c2c: Pull complete 2023-01-11T21:19:23.0521054Z 47037a50f270: Pull complete 2023-01-11T21:19:25.2845104Z 1a2fd7b216d7: Pull complete 2023-01-11T21:19:28.1025698Z 765839304d2e: Pull complete 2023-01-11T21:19:31.5161889Z e51794baeb92: Pull complete 2023-01-11T21:19:35.8234988Z ea4bfeaa0fc7: Pull complete 2023-01-11T21:19:44.0914712Z d8065d17513d: Pull complete 2023-01-11T21:19:45.9401205Z 6d83ca3dedf3: Pull complete 2023-01-11T21:19:47.7224008Z 12ddc57b99eb: Pull complete 2023-01-11T21:19:50.5809357Z b590670d273c: Pull complete 2023-01-11T21:19:52.3123283Z 8afbc57dfec9: Pull complete 2023-01-11T21:19:53.5547760Z 29a7c0d5fa4c: Pull complete 2023-01-11T21:19:54.1014612Z 16825bb02017: Pull complete 2023-01-11T21:19:54.3711313Z bdf297d7f88c: Pull complete 2023-01-11T21:19:55.7493703Z 885c12efa4ae: Pull complete 2023-01-11T21:19:55.8554139Z 28c5689cb975: Pull complete 2023-01-11T21:19:55.9441018Z cca768f96df4: Pull complete 2023-01-11T21:20:00.4969143Z 904b81494b5e: Pull complete 2023-01-11T21:20:00.6329726Z 61eecfa8b34e: Pull complete 2023-01-11T21:20:00.8196174Z 95c1ac011645: Pull complete 2023-01-11T21:20:01.0205396Z 07cee023724c: Pull complete 2023-01-11T21:20:01.1696428Z 195d560d8cf6: Pull complete 2023-01-11T21:20:01.9601976Z a399389c7f8e: Pull complete 2023-01-11T21:20:02.1206574Z 7447f84b33ef: Pull complete 2023-01-11T21:20:05.2795753Z 0d8aeb1421f9: Pull complete 2023-01-11T21:20:05.3848369Z 02048a597c22: Pull complete 2023-01-11T21:20:33.4530776Z 25d615d8a5e2: Pull complete 2023-01-11T21:20:33.5524856Z 09d400b86049: Pull complete 2023-01-11T21:20:33.5677088Z Digest: sha256:0da23f4faf0ce20770149c4a783e08eaa91c07112511dc5511c77937c66edb24 2023-01-11T21:20:33.5721408Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:20:33.5771995Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:20:33.5873441Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main 2023-01-11T21:20:33.5873793Z with: 2023-01-11T21:20:33.5874038Z driver-version: 515.76 2023-01-11T21:20:33.5874281Z env: 2023-01-11T21:20:33.5874499Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:20:33.5874758Z ##[endgroup] 2023-01-11T21:20:33.5908733Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 2023-01-11T21:20:33.5909035Z with: 2023-01-11T21:20:33.5909272Z timeout_minutes: 10 2023-01-11T21:20:33.5909525Z max_attempts: 3 2023-01-11T21:20:33.5916290Z command: # Is it disgusting to have a full shell script here in this github action? Sure # But is it the best way to make it so that this action relies on nothing else? Absolutely set -eou pipefail DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run" YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo" install_nvidia_docker2_amzn2() { ( set -x # Needed for yum-config-manager sudo yum install -y yum-utils sudo yum-config-manager --add-repo "${YUM_REPO_URL}" sudo yum install -y nvidia-docker2 sudo systemctl restart docker ) } install_nvidia_docker2_ubuntu20() { ( set -x sudo apt-get install -y nvidia-docker2 sudo systemctl restart docker ) } pre_install_nvidia_driver_amzn2() { ( # Purge any nvidia driver installed from RHEL repo sudo yum remove -y nvidia-driver-latest-dkms ) } install_nvidia_driver_common() { ( # Try to gather more information about the runner and its existing NVIDIA driver if any echo "Before installing NVIDIA driver" lspci lsmod modinfo nvidia || true HAS_NVIDIA_DRIVER=0 # Check if NVIDIA driver has already been installed if [ -x "$(command -v nvidia-smi)" ]; then set +e # The driver exists, check its version next. Also check only the first GPU if there are more than one of them # so that the same driver version is not print over multiple lines INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing" elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing" else HAS_NVIDIA_DRIVER=1 echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation" fi set -e fi if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then # CAUTION: this may need to be updated in future if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then sudo yum groupinstall -y "Development Tools" # ensure our kernel install is the same as our underlying kernel, # groupinstall "Development Tools" has a habit of mismatching kernel headers sudo yum install -y "kernel-devel-uname-r == $(uname -r)" sudo modprobe backlight fi sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN" set +e sudo /bin/bash /tmp/nvidia_driver -s --no-drm NVIDIA_INSTALLATION_STATUS=$? RESET_GPU=0 if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then sudo cat /var/log/nvidia-installer.log # Fail to install NVIDIA driver, try to reset the GPU RESET_GPU=1 elif [ -x "$(command -v nvidia-smi)" ]; then # Check again if nvidia-smi works even if the driver installation completes successfully INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then RESET_GPU=1 fi fi if [ "$RESET_GPU" -eq 1 ]; then NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1) # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388 for PCI_ID in $NVIDIA_DEVICES; do DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable) echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)" # This requires sudo permission of course echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset sleep 1 done fi sudo rm -fv /tmp/nvidia_driver set -e fi ) } post_install_nvidia_driver_common() { ( sudo modprobe nvidia || true echo "After installing NVIDIA driver" lspci lsmod modinfo nvidia || true ( set +e nvidia-smi NVIDIA_SMI_STATUS=$? # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285 if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}" else echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}" exit ${NVIDIA_SMI_STATUS} fi set -e ) ) } install_nvidia_driver_amzn2() { ( set -x pre_install_nvidia_driver_amzn2 install_nvidia_driver_common post_install_nvidia_driver_common ) } install_nvidia_driver_ubuntu20() { ( set -x install_nvidia_driver_common post_install_nvidia_driver_common ) } echo "== Installing nvidia driver ${DRIVER_FN} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_driver_amzn2 ;; ubuntu20.04) install_nvidia_driver_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Install container toolkit based on distribution echo "== Installing nvidia container toolkit for ${DISTRIBUTION} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_docker2_amzn2 ;; ubuntu20.04) install_nvidia_docker2_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" 2023-01-11T21:20:33.5923317Z retry_wait_seconds: 10 2023-01-11T21:20:33.5923606Z polling_interval_seconds: 1 2023-01-11T21:20:33.5923880Z warning_on_retry: true 2023-01-11T21:20:33.5924126Z continue_on_error: false 2023-01-11T21:20:33.5924374Z env: 2023-01-11T21:20:33.5924611Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:20:33.5924860Z DRIVER_VERSION: 515.76 2023-01-11T21:20:33.5925109Z ##[endgroup] 2023-01-11T21:20:33.6623932Z == Installing nvidia driver NVIDIA-Linux-x86_64-515.76.run == 2023-01-11T21:20:33.6626534Z + pre_install_nvidia_driver_amzn2 2023-01-11T21:20:33.6627411Z + sudo yum remove -y nvidia-driver-latest-dkms 2023-01-11T21:20:34.2139732Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:20:34.2725959Z No Match for argument: nvidia-driver-latest-dkms 2023-01-11T21:20:34.3103863Z No Packages marked for removal 2023-01-11T21:20:34.3264390Z + install_nvidia_driver_common 2023-01-11T21:20:34.3268318Z + echo 'Before installing NVIDIA driver' 2023-01-11T21:20:34.3269113Z + lspci 2023-01-11T21:20:34.3269671Z Before installing NVIDIA driver 2023-01-11T21:20:34.3455319Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 2023-01-11T21:20:34.3455773Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2023-01-11T21:20:34.3457349Z 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 2023-01-11T21:20:34.3457794Z 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01) 2023-01-11T21:20:34.3458182Z 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 2023-01-11T21:20:34.3458574Z 00:03.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2023-01-11T21:20:34.3458988Z 00:1d.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) 2023-01-11T21:20:34.3459415Z 00:1e.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) 2023-01-11T21:20:34.3459871Z 00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01) 2023-01-11T21:20:34.3460173Z + lsmod 2023-01-11T21:20:34.3477852Z Module Size Used by 2023-01-11T21:20:34.3478176Z nvidia_modeset 1142784 0 2023-01-11T21:20:34.3478452Z nvidia_uvm 1269760 0 2023-01-11T21:20:34.3478721Z veth 16384 0 2023-01-11T21:20:34.3479006Z nvidia 40808448 2 nvidia_uvm,nvidia_modeset 2023-01-11T21:20:34.3479301Z drm 425984 1 nvidia 2023-01-11T21:20:34.3479578Z i2c_core 77824 2 nvidia,drm 2023-01-11T21:20:34.3479862Z backlight 16384 1 nvidia_modeset 2023-01-11T21:20:34.3480144Z xt_conntrack 16384 1 2023-01-11T21:20:34.3482333Z ipt_MASQUERADE 16384 1 2023-01-11T21:20:34.3483000Z nf_nat_masquerade_ipv4 16384 1 ipt_MASQUERADE 2023-01-11T21:20:34.3483683Z nf_conntrack_netlink 49152 0 2023-01-11T21:20:34.3484150Z nfnetlink 16384 2 nf_conntrack_netlink 2023-01-11T21:20:34.3484420Z xfrm_user 45056 1 2023-01-11T21:20:34.3484701Z xfrm_algo 16384 1 xfrm_user 2023-01-11T21:20:34.3485009Z xt_addrtype 16384 2 2023-01-11T21:20:34.3485269Z iptable_filter 16384 1 2023-01-11T21:20:34.3485535Z iptable_nat 16384 1 2023-01-11T21:20:34.3486096Z nf_conntrack_ipv4 16384 3 2023-01-11T21:20:34.3486659Z nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 2023-01-11T21:20:34.3487145Z nf_nat_ipv4 16384 1 iptable_nat 2023-01-11T21:20:34.3487698Z nf_nat 36864 2 nf_nat_masquerade_ipv4,nf_nat_ipv4 2023-01-11T21:20:34.3488180Z nf_conntrack 155648 7 xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4,nf_nat,ipt_MASQUERADE,nf_nat_ipv4,nf_conntrack_netlink 2023-01-11T21:20:34.3488570Z br_netfilter 24576 0 2023-01-11T21:20:34.3488861Z bridge 172032 1 br_netfilter 2023-01-11T21:20:34.3489139Z stp 16384 1 bridge 2023-01-11T21:20:34.3489398Z llc 16384 2 bridge,stp 2023-01-11T21:20:34.3489668Z overlay 86016 0 2023-01-11T21:20:34.3489928Z sunrpc 393216 1 2023-01-11T21:20:34.3490174Z dm_mirror 28672 0 2023-01-11T21:20:34.3490449Z dm_region_hash 20480 1 dm_mirror 2023-01-11T21:20:34.3490765Z dm_log 20480 2 dm_region_hash,dm_mirror 2023-01-11T21:20:34.3491079Z dm_mod 143360 2 dm_log,dm_mirror 2023-01-11T21:20:34.3491337Z dax 69632 1 dm_mod 2023-01-11T21:20:34.3491595Z sb_edac 24576 0 2023-01-11T21:20:34.3492006Z crc32_pclmul 16384 0 2023-01-11T21:20:34.3492460Z ghash_clmulni_intel 16384 0 2023-01-11T21:20:34.3492750Z pcbc 16384 0 2023-01-11T21:20:34.3493470Z aesni_intel 188416 0 2023-01-11T21:20:34.3493983Z ata_piix 36864 0 2023-01-11T21:20:34.3494655Z aes_x86_64 20480 1 aesni_intel 2023-01-11T21:20:34.3495249Z libata 266240 1 ata_piix 2023-01-11T21:20:34.3495758Z crypto_simd 16384 1 aesni_intel 2023-01-11T21:20:34.3496103Z glue_helper 16384 1 aesni_intel 2023-01-11T21:20:34.3496575Z cryptd 28672 3 crypto_simd,ghash_clmulni_intel,aesni_intel 2023-01-11T21:20:34.3496923Z pcc_cpufreq 16384 0 2023-01-11T21:20:34.3497173Z mousedev 24576 0 2023-01-11T21:20:34.3497425Z evdev 20480 3 2023-01-11T21:20:34.3497688Z scsi_mod 245760 1 libata 2023-01-11T21:20:34.3497939Z psmouse 32768 0 2023-01-11T21:20:34.3498195Z button 16384 0 2023-01-11T21:20:34.3498442Z ena 114688 0 2023-01-11T21:20:34.3498679Z xen_blkfront 49152 2 2023-01-11T21:20:34.3498948Z crc32c_intel 24576 0 2023-01-11T21:20:34.3499200Z autofs4 49152 2 2023-01-11T21:20:34.3499443Z + modinfo nvidia 2023-01-11T21:20:34.3499909Z filename: /lib/modules/4.14.252-195.483.amzn2.x86_64/kernel/drivers/video/nvidia.ko 2023-01-11T21:20:34.3500261Z firmware: nvidia/515.76/gsp.bin 2023-01-11T21:20:34.3500604Z alias: char-major-195-* 2023-01-11T21:20:34.3500860Z version: 515.76 2023-01-11T21:20:34.3501120Z supported: external 2023-01-11T21:20:34.3501374Z license: NVIDIA 2023-01-11T21:20:34.3501635Z srcversion: 51FD9DD90150B35351AFFBB 2023-01-11T21:20:34.3501951Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2023-01-11T21:20:34.3502266Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2023-01-11T21:20:34.3502560Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2023-01-11T21:20:34.3502899Z depends: i2c-core,drm 2023-01-11T21:20:34.3503167Z retpoline: Y 2023-01-11T21:20:34.3503396Z name: nvidia 2023-01-11T21:20:34.3503798Z vermagic: 4.14.252-195.483.amzn2.x86_64 SMP mod_unload modversions 2023-01-11T21:20:34.3504178Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2023-01-11T21:20:34.3504555Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2023-01-11T21:20:34.3504929Z parm: NVreg_ResmanDebugLevel:int 2023-01-11T21:20:34.3505230Z parm: NVreg_RmLogonRC:int 2023-01-11T21:20:34.3505540Z parm: NVreg_ModifyDeviceFiles:int 2023-01-11T21:20:34.3505833Z parm: NVreg_DeviceFileUID:int 2023-01-11T21:20:34.3506239Z parm: NVreg_DeviceFileGID:int 2023-01-11T21:20:34.3506542Z parm: NVreg_DeviceFileMode:int 2023-01-11T21:20:34.3506888Z parm: NVreg_InitializeSystemMemoryAllocations:int 2023-01-11T21:20:34.3507263Z parm: NVreg_UsePageAttributeTable:int 2023-01-11T21:20:34.3507585Z parm: NVreg_EnablePCIeGen3:int 2023-01-11T21:20:34.3507870Z parm: NVreg_EnableMSI:int 2023-01-11T21:20:34.3508166Z parm: NVreg_TCEBypassMode:int 2023-01-11T21:20:34.3508483Z parm: NVreg_EnableStreamMemOPs:int 2023-01-11T21:20:34.3508828Z parm: NVreg_RestrictProfilingToAdminUsers:int 2023-01-11T21:20:34.3509220Z parm: NVreg_PreserveVideoMemoryAllocations:int 2023-01-11T21:20:34.3509594Z parm: NVreg_EnableS0ixPowerManagement:int 2023-01-11T21:20:34.3509979Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2023-01-11T21:20:34.3510377Z parm: NVreg_DynamicPowerManagement:int 2023-01-11T21:20:34.3510795Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2023-01-11T21:20:34.3511192Z parm: NVreg_EnableGpuFirmware:int 2023-01-11T21:20:34.3511509Z parm: NVreg_EnableGpuFirmwareLogs:int 2023-01-11T21:20:34.3511870Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2023-01-11T21:20:34.3512240Z parm: NVreg_EnableUserNUMAManagement:int 2023-01-11T21:20:34.3512559Z parm: NVreg_MemoryPoolSize:int 2023-01-11T21:20:34.3512878Z parm: NVreg_KMallocHeapMaxSize:int 2023-01-11T21:20:34.3513204Z parm: NVreg_VMallocHeapMaxSize:int 2023-01-11T21:20:34.3513502Z parm: NVreg_IgnoreMMIOCheck:int 2023-01-11T21:20:34.3513811Z parm: NVreg_NvLinkDisable:int 2023-01-11T21:20:34.3514158Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2023-01-11T21:20:34.3514557Z parm: NVreg_RegisterPCIDriver:int 2023-01-11T21:20:34.3514897Z parm: NVreg_EnableDbgBreakpoint:int 2023-01-11T21:20:34.3515218Z parm: NVreg_RegistryDwords:charp 2023-01-11T21:20:34.3515561Z parm: NVreg_RegistryDwordsPerDevice:charp 2023-01-11T21:20:34.3515861Z parm: NVreg_RmMsg:charp 2023-01-11T21:20:34.3516157Z parm: NVreg_GpuBlacklist:charp 2023-01-11T21:20:34.3516479Z parm: NVreg_TemporaryFilePath:charp 2023-01-11T21:20:34.3516778Z parm: NVreg_ExcludedGpus:charp 2023-01-11T21:20:34.3517088Z parm: NVreg_DmaRemapPeerMmio:int 2023-01-11T21:20:34.3517396Z parm: rm_firmware_active:charp 2023-01-11T21:20:34.3517660Z + HAS_NVIDIA_DRIVER=0 2023-01-11T21:20:34.3517980Z ++ command -v nvidia-smi 2023-01-11T21:20:34.3518299Z + '[' -x /usr/bin/nvidia-smi ']' 2023-01-11T21:20:34.3518548Z + set +e 2023-01-11T21:20:34.3518944Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 2023-01-11T21:20:38.3983329Z + INSTALLED_DRIVER_VERSION=515.76 2023-01-11T21:20:38.3983675Z + NVIDIA_SMI_STATUS=0 2023-01-11T21:20:38.3984098Z + '[' 0 -ne 0 ']' 2023-01-11T21:20:38.3984413Z + '[' 515.76 '!=' 515.76 ']' 2023-01-11T21:20:38.3984652Z + HAS_NVIDIA_DRIVER=1 2023-01-11T21:20:38.3985131Z + echo 'NVIDIA driver (515.76) has already been installed. Skipping NVIDIA driver installation' 2023-01-11T21:20:38.3985493Z + set -e 2023-01-11T21:20:38.3985737Z + '[' 1 -eq 0 ']' 2023-01-11T21:20:38.3986090Z NVIDIA driver (515.76) has already been installed. Skipping NVIDIA driver installation 2023-01-11T21:20:38.3986460Z + post_install_nvidia_driver_common 2023-01-11T21:20:38.3988645Z + sudo modprobe nvidia 2023-01-11T21:20:38.4137049Z + echo 'After installing NVIDIA driver' 2023-01-11T21:20:38.4137329Z + lspci 2023-01-11T21:20:38.4137581Z After installing NVIDIA driver 2023-01-11T21:20:38.4334671Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 2023-01-11T21:20:38.4335116Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2023-01-11T21:20:38.4335525Z 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 2023-01-11T21:20:38.4336151Z 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01) 2023-01-11T21:20:38.4336499Z 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 2023-01-11T21:20:38.4336900Z 00:03.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2023-01-11T21:20:38.4337322Z 00:1d.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) 2023-01-11T21:20:38.4337726Z 00:1e.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) 2023-01-11T21:20:38.4338144Z 00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01) 2023-01-11T21:20:38.4338458Z + lsmod 2023-01-11T21:20:38.4357088Z Module Size Used by 2023-01-11T21:20:38.4357361Z nvidia_modeset 1142784 0 2023-01-11T21:20:38.4357625Z nvidia_uvm 1269760 0 2023-01-11T21:20:38.4357887Z veth 16384 0 2023-01-11T21:20:38.4358173Z nvidia 40808448 2 nvidia_uvm,nvidia_modeset 2023-01-11T21:20:38.4358469Z drm 425984 1 nvidia 2023-01-11T21:20:38.4358743Z i2c_core 77824 2 nvidia,drm 2023-01-11T21:20:38.4359089Z backlight 16384 1 nvidia_modeset 2023-01-11T21:20:38.4359485Z xt_conntrack 16384 1 2023-01-11T21:20:38.4359859Z ipt_MASQUERADE 16384 1 2023-01-11T21:20:38.4360385Z nf_nat_masquerade_ipv4 16384 1 ipt_MASQUERADE 2023-01-11T21:20:38.4360767Z nf_conntrack_netlink 49152 0 2023-01-11T21:20:38.4361081Z nfnetlink 16384 2 nf_conntrack_netlink 2023-01-11T21:20:38.4361429Z xfrm_user 45056 1 2023-01-11T21:20:38.4361787Z xfrm_algo 16384 1 xfrm_user 2023-01-11T21:20:38.4362073Z xt_addrtype 16384 2 2023-01-11T21:20:38.4362388Z iptable_filter 16384 1 2023-01-11T21:20:38.4362717Z iptable_nat 16384 1 2023-01-11T21:20:38.4362994Z nf_conntrack_ipv4 16384 3 2023-01-11T21:20:38.4363514Z nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 2023-01-11T21:20:38.4363914Z nf_nat_ipv4 16384 1 iptable_nat 2023-01-11T21:20:38.4364264Z nf_nat 36864 2 nf_nat_masquerade_ipv4,nf_nat_ipv4 2023-01-11T21:20:38.4364784Z nf_conntrack 155648 7 xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4,nf_nat,ipt_MASQUERADE,nf_nat_ipv4,nf_conntrack_netlink 2023-01-11T21:20:38.4365242Z br_netfilter 24576 0 2023-01-11T21:20:38.4365619Z bridge 172032 1 br_netfilter 2023-01-11T21:20:38.4365912Z stp 16384 1 bridge 2023-01-11T21:20:38.4366238Z llc 16384 2 bridge,stp 2023-01-11T21:20:38.4366556Z overlay 86016 0 2023-01-11T21:20:38.4366823Z sunrpc 393216 1 2023-01-11T21:20:38.4367185Z dm_mirror 28672 0 2023-01-11T21:20:38.4367510Z dm_region_hash 20480 1 dm_mirror 2023-01-11T21:20:38.4367825Z dm_log 20480 2 dm_region_hash,dm_mirror 2023-01-11T21:20:38.4368185Z dm_mod 143360 2 dm_log,dm_mirror 2023-01-11T21:20:38.4368556Z dax 69632 1 dm_mod 2023-01-11T21:20:38.4368834Z sb_edac 24576 0 2023-01-11T21:20:38.4369139Z crc32_pclmul 16384 0 2023-01-11T21:20:38.4369457Z ghash_clmulni_intel 16384 0 2023-01-11T21:20:38.4369735Z pcbc 16384 0 2023-01-11T21:20:38.4370103Z aesni_intel 188416 0 2023-01-11T21:20:38.4370410Z ata_piix 36864 0 2023-01-11T21:20:38.4370725Z aes_x86_64 20480 1 aesni_intel 2023-01-11T21:20:38.4371011Z libata 266240 1 ata_piix 2023-01-11T21:20:38.4371364Z crypto_simd 16384 1 aesni_intel 2023-01-11T21:20:38.4371745Z glue_helper 16384 1 aesni_intel 2023-01-11T21:20:38.4372105Z cryptd 28672 3 crypto_simd,ghash_clmulni_intel,aesni_intel 2023-01-11T21:20:38.4372488Z pcc_cpufreq 16384 0 2023-01-11T21:20:38.4372816Z mousedev 24576 0 2023-01-11T21:20:38.4373717Z evdev 20480 3 2023-01-11T21:20:38.4374060Z scsi_mod 245760 1 libata 2023-01-11T21:20:38.4374501Z psmouse 32768 0 2023-01-11T21:20:38.4374765Z button 16384 0 2023-01-11T21:20:38.4375060Z ena 114688 0 2023-01-11T21:20:38.4375405Z xen_blkfront 49152 2 2023-01-11T21:20:38.4375671Z crc32c_intel 24576 0 2023-01-11T21:20:38.4375979Z autofs4 49152 2 2023-01-11T21:20:38.4376314Z + modinfo nvidia 2023-01-11T21:20:38.4376917Z filename: /lib/modules/4.14.252-195.483.amzn2.x86_64/kernel/drivers/video/nvidia.ko 2023-01-11T21:20:38.4377292Z firmware: nvidia/515.76/gsp.bin 2023-01-11T21:20:38.4377712Z alias: char-major-195-* 2023-01-11T21:20:38.4378033Z version: 515.76 2023-01-11T21:20:38.4378295Z supported: external 2023-01-11T21:20:38.4378632Z license: NVIDIA 2023-01-11T21:20:38.4378969Z srcversion: 51FD9DD90150B35351AFFBB 2023-01-11T21:20:38.4379297Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2023-01-11T21:20:38.4379669Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2023-01-11T21:20:38.4380050Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2023-01-11T21:20:38.4380442Z depends: i2c-core,drm 2023-01-11T21:20:38.4380755Z retpoline: Y 2023-01-11T21:20:38.4381048Z name: nvidia 2023-01-11T21:20:38.4381458Z vermagic: 4.14.252-195.483.amzn2.x86_64 SMP mod_unload modversions 2023-01-11T21:20:38.4381900Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2023-01-11T21:20:38.4382385Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2023-01-11T21:20:38.4382801Z parm: NVreg_ResmanDebugLevel:int 2023-01-11T21:20:38.4383104Z parm: NVreg_RmLogonRC:int 2023-01-11T21:20:38.4383491Z parm: NVreg_ModifyDeviceFiles:int 2023-01-11T21:20:38.4383889Z parm: NVreg_DeviceFileUID:int 2023-01-11T21:20:38.4384280Z parm: NVreg_DeviceFileGID:int 2023-01-11T21:20:38.4384647Z parm: NVreg_DeviceFileMode:int 2023-01-11T21:20:38.4385070Z parm: NVreg_InitializeSystemMemoryAllocations:int 2023-01-11T21:20:38.4385461Z parm: NVreg_UsePageAttributeTable:int 2023-01-11T21:20:38.4385872Z parm: NVreg_EnablePCIeGen3:int 2023-01-11T21:20:38.4413538Z parm: NVreg_EnableMSI:int 2023-01-11T21:20:38.4413844Z parm: NVreg_TCEBypassMode:int 2023-01-11T21:20:38.4414169Z parm: NVreg_EnableStreamMemOPs:int 2023-01-11T21:20:38.4414532Z parm: NVreg_RestrictProfilingToAdminUsers:int 2023-01-11T21:20:38.4414915Z parm: NVreg_PreserveVideoMemoryAllocations:int 2023-01-11T21:20:38.4415294Z parm: NVreg_EnableS0ixPowerManagement:int 2023-01-11T21:20:38.4415706Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2023-01-11T21:20:38.4416089Z parm: NVreg_DynamicPowerManagement:int 2023-01-11T21:20:38.4416511Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2023-01-11T21:20:38.4416910Z parm: NVreg_EnableGpuFirmware:int 2023-01-11T21:20:38.4417246Z parm: NVreg_EnableGpuFirmwareLogs:int 2023-01-11T21:20:38.4417592Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2023-01-11T21:20:38.4417960Z parm: NVreg_EnableUserNUMAManagement:int 2023-01-11T21:20:38.4418291Z parm: NVreg_MemoryPoolSize:int 2023-01-11T21:20:38.4418591Z parm: NVreg_KMallocHeapMaxSize:int 2023-01-11T21:20:38.4418919Z parm: NVreg_VMallocHeapMaxSize:int 2023-01-11T21:20:38.4419233Z parm: NVreg_IgnoreMMIOCheck:int 2023-01-11T21:20:38.4419518Z parm: NVreg_NvLinkDisable:int 2023-01-11T21:20:38.4419863Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2023-01-11T21:20:38.4420217Z parm: NVreg_RegisterPCIDriver:int 2023-01-11T21:20:38.4420527Z parm: NVreg_EnableDbgBreakpoint:int 2023-01-11T21:20:38.4420853Z parm: NVreg_RegistryDwords:charp 2023-01-11T21:20:38.4421196Z parm: NVreg_RegistryDwordsPerDevice:charp 2023-01-11T21:20:38.4421517Z parm: NVreg_RmMsg:charp 2023-01-11T21:20:38.4421961Z parm: NVreg_GpuBlacklist:charp 2023-01-11T21:20:38.4422284Z parm: NVreg_TemporaryFilePath:charp 2023-01-11T21:20:38.4422600Z parm: NVreg_ExcludedGpus:charp 2023-01-11T21:20:38.4422896Z parm: NVreg_DmaRemapPeerMmio:int 2023-01-11T21:20:38.4423200Z parm: rm_firmware_active:charp 2023-01-11T21:20:38.4423456Z + set +e 2023-01-11T21:20:38.4423740Z + nvidia-smi 2023-01-11T21:20:41.7501803Z Wed Jan 11 21:20:41 2023 2023-01-11T21:20:41.7502927Z +-----------------------------------------------------------------------------+ 2023-01-11T21:20:41.7503753Z | NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7 | 2023-01-11T21:20:41.7504245Z |-------------------------------+----------------------+----------------------+ 2023-01-11T21:20:41.7504760Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2023-01-11T21:20:41.7505267Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2023-01-11T21:20:41.7505634Z | | | MIG M. | 2023-01-11T21:20:41.7505935Z |===============================+======================+======================| 2023-01-11T21:20:41.7556864Z | 0 Tesla M60 Off | 00000000:00:1D.0 Off | 8030846975 | 2023-01-11T21:20:41.7557606Z | N/A 29C P0 39W / 150W | 0MiB / 7680MiB | 0% Default | 2023-01-11T21:20:41.7558212Z | | | N/A | 2023-01-11T21:20:41.7558682Z +-------------------------------+----------------------+----------------------+ 2023-01-11T21:20:41.7607965Z | 1 Tesla M60 Off | 00000000:00:1E.0 Off | 16620751025 | 2023-01-11T21:20:41.7608569Z | N/A 35C P0 38W / 150W | 0MiB / 7680MiB | 99% Default | 2023-01-11T21:20:41.7609626Z | | | N/A | 2023-01-11T21:20:41.7610165Z +-------------------------------+----------------------+----------------------+ 2023-01-11T21:20:41.7610531Z 2023-01-11T21:20:41.7610976Z +-----------------------------------------------------------------------------+ 2023-01-11T21:20:41.7611353Z | Processes: | 2023-01-11T21:20:41.7611682Z | GPU GI CI PID Type Process name GPU Memory | 2023-01-11T21:20:41.7612018Z | ID ID Usage | 2023-01-11T21:20:41.7612316Z |=============================================================================| 2023-01-11T21:20:41.7613729Z | No running processes found | 2023-01-11T21:20:41.7614765Z +-----------------------------------------------------------------------------+ 2023-01-11T21:20:41.8037019Z + NVIDIA_SMI_STATUS=0 2023-01-11T21:20:41.8037652Z + '[' 0 -eq 0 ']' 2023-01-11T21:20:41.8038349Z + echo 'INFO: Ignoring allowed status 0' 2023-01-11T21:20:41.8038970Z + set -e 2023-01-11T21:20:41.8039377Z INFO: Ignoring allowed status 0 2023-01-11T21:20:41.8044151Z == Installing nvidia container toolkit for amzn2 == 2023-01-11T21:20:41.8048488Z + sudo yum install -y yum-utils 2023-01-11T21:20:42.3558758Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:20:43.7867909Z Package yum-utils-1.1.31-46.amzn2.0.1.noarch already installed and latest version 2023-01-11T21:20:43.7868453Z Nothing to do 2023-01-11T21:20:43.8710911Z + sudo yum-config-manager --add-repo https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo 2023-01-11T21:20:44.4377033Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:20:44.4721955Z adding repo from: https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo 2023-01-11T21:20:44.4723244Z grabbing file https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo to /etc/yum.repos.d/nvidia-docker.repo 2023-01-11T21:20:44.4724031Z repo saved to /etc/yum.repos.d/nvidia-docker.repo 2023-01-11T21:20:44.4871189Z + sudo yum install -y nvidia-docker2 2023-01-11T21:20:45.0435075Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2023-01-11T21:20:46.3806921Z Package nvidia-docker2-2.11.0-1.noarch already installed and latest version 2023-01-11T21:20:46.3807467Z Nothing to do 2023-01-11T21:20:46.4620890Z + sudo systemctl restart docker 2023-01-11T21:21:07.7006454Z Command completed after 1 attempt(s). 2023-01-11T21:21:07.7071904Z ##[group]Run python3 -m pip install psutil==5.9.1 2023-01-11T21:21:07.7072335Z python3 -m pip install psutil==5.9.1 2023-01-11T21:21:07.7072689Z python3 -m pip install pynvml==11.4.1 2023-01-11T21:21:07.7073074Z python3 -m tools.stats.monitor > usage_log.txt 2>&1 & 2023-01-11T21:21:07.7073484Z echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" 2023-01-11T21:21:07.7087940Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:21:07.7088267Z env: 2023-01-11T21:21:07.7088538Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:21:07.7088806Z GPU_FLAG: --gpus all 2023-01-11T21:21:07.7089082Z ##[endgroup] 2023-01-11T21:21:08.0150820Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T21:21:08.0378236Z Requirement already satisfied: psutil==5.9.1 in /home/ec2-user/.local/lib/python3.7/site-packages (5.9.1) 2023-01-11T21:21:08.6084891Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T21:21:08.6310775Z Requirement already satisfied: pynvml==11.4.1 in /home/ec2-user/.local/lib/python3.7/site-packages (11.4.1) 2023-01-11T21:21:08.9137696Z Prepare all required actions 2023-01-11T21:21:08.9138084Z Getting action download info 2023-01-11T21:21:09.1708373Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:4a8bfae15cc25cc0785c1603ee87a9da8fd442ea) 2023-01-11T21:21:09.3735994Z Download action repository 'actions/download-artifact@v3' (SHA:9bc31d5ccc31df68ecc42ccf4149144866c47d8a) 2023-01-11T21:21:09.5220652Z ##[group]Run ./.github/actions/download-build-artifacts 2023-01-11T21:21:09.5220960Z with: 2023-01-11T21:21:09.5221229Z name: linux-bionic-cuda11.7-py3.10-gcc7 2023-01-11T21:21:09.5221514Z env: 2023-01-11T21:21:09.5221759Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:21:09.5222013Z GPU_FLAG: --gpus all 2023-01-11T21:21:09.5222264Z ##[endgroup] 2023-01-11T21:21:09.5252770Z ##[group]Run seemethere/download-artifact-s3@v4 2023-01-11T21:21:09.5253576Z with: 2023-01-11T21:21:09.5253844Z name: linux-bionic-cuda11.7-py3.10-gcc7 2023-01-11T21:21:09.5254152Z s3-bucket: gha-artifacts 2023-01-11T21:21:09.5254487Z region: us-east-1 2023-01-11T21:21:09.5254709Z env: 2023-01-11T21:21:09.5254947Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:21:09.5255212Z GPU_FLAG: --gpus all 2023-01-11T21:21:09.5255444Z ##[endgroup] 2023-01-11T21:21:10.0655947Z Found 1 objects with prefix pytorch/pytorch/3896346758/linux-bionic-cuda11.7-py3.10-gcc7/ 2023-01-11T21:21:10.0656608Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2023-01-11T21:21:16.4536453Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2023-01-11T21:21:16.4536777Z 2023-01-11T21:21:16.4557314Z ##[warning]The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/ 2023-01-11T21:21:16.4567569Z Artifact download has finished successfully 2023-01-11T21:21:16.4713690Z ##[group]Run unzip -o artifacts.zip 2023-01-11T21:21:16.4714026Z unzip -o artifacts.zip 2023-01-11T21:21:16.4727439Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:21:16.4727725Z env: 2023-01-11T21:21:16.4727972Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:21:16.4728249Z GPU_FLAG: --gpus all 2023-01-11T21:21:16.4728643Z ##[endgroup] 2023-01-11T21:21:16.4771618Z Archive: artifacts.zip 2023-01-11T21:21:16.4774267Z creating: dist/ 2023-01-11T21:21:18.5630771Z inflating: dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl 2023-01-11T21:21:18.5631200Z creating: build/custom_test_artifacts/ 2023-01-11T21:21:18.5631636Z creating: build/custom_test_artifacts/custom-op-build/ 2023-01-11T21:21:18.5632095Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2023-01-11T21:21:18.5638891Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeOutput.log 2023-01-11T21:21:18.5639435Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/ 2023-01-11T21:21:18.5640015Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2023-01-11T21:21:18.5640562Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/ 2023-01-11T21:21:18.5641121Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2023-01-11T21:21:18.5643465Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2023-01-11T21:21:18.5644528Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2023-01-11T21:21:18.5645072Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2023-01-11T21:21:18.5645638Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2023-01-11T21:21:18.5648213Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2023-01-11T21:21:18.5649648Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2023-01-11T21:21:18.5650960Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2023-01-11T21:21:18.5651919Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2023-01-11T21:21:18.5653788Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2023-01-11T21:21:18.5654612Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2023-01-11T21:21:18.5655208Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2023-01-11T21:21:18.5655782Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2023-01-11T21:21:18.5709919Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2023-01-11T21:21:18.5710648Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2023-01-11T21:21:18.5711369Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2023-01-11T21:21:18.5712107Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2023-01-11T21:21:18.5712817Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2023-01-11T21:21:18.5713515Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2023-01-11T21:21:18.5714212Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2023-01-11T21:21:18.5714912Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2023-01-11T21:21:18.5715783Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2023-01-11T21:21:18.5757774Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2023-01-11T21:21:18.5799395Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2023-01-11T21:21:18.5800334Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2023-01-11T21:21:18.5801175Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2023-01-11T21:21:18.5801831Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2023-01-11T21:21:18.5802474Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2023-01-11T21:21:18.5803379Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2023-01-11T21:21:18.5804401Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2023-01-11T21:21:18.5805922Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2023-01-11T21:21:18.5879642Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2023-01-11T21:21:18.5952760Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2023-01-11T21:21:18.5953385Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2023-01-11T21:21:18.5954282Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2023-01-11T21:21:18.5955348Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeError.log 2023-01-11T21:21:18.5955938Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2023-01-11T21:21:18.5956659Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2023-01-11T21:21:18.5957373Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2023-01-11T21:21:18.5958006Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2023-01-11T21:21:18.5958615Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2023-01-11T21:21:18.5959195Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2023-01-11T21:21:18.5959770Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2023-01-11T21:21:18.5960865Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2023-01-11T21:21:18.5961594Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2023-01-11T21:21:18.5962229Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2023-01-11T21:21:18.5962836Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2023-01-11T21:21:18.5983906Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2023-01-11T21:21:18.6101407Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2023-01-11T21:21:18.6101974Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2023-01-11T21:21:18.6102580Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2023-01-11T21:21:18.6103210Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2023-01-11T21:21:18.6103830Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2023-01-11T21:21:18.6104433Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2023-01-11T21:21:18.6105212Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2023-01-11T21:21:18.6106001Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2023-01-11T21:21:18.6106604Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2023-01-11T21:21:18.6107215Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2023-01-11T21:21:18.6107819Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2023-01-11T21:21:18.6128853Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2023-01-11T21:21:18.6215369Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2023-01-11T21:21:18.6216022Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2023-01-11T21:21:18.6216636Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2023-01-11T21:21:18.6217203Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2023-01-11T21:21:18.6218052Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2023-01-11T21:21:18.6219294Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2023-01-11T21:21:18.6219969Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2023-01-11T21:21:18.6222782Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2023-01-11T21:21:18.6223465Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2023-01-11T21:21:18.6224155Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2023-01-11T21:21:18.6318963Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2023-01-11T21:21:18.6383685Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2023-01-11T21:21:18.6384147Z creating: build/custom_test_artifacts/jit-hook-build/ 2023-01-11T21:21:18.6384625Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2023-01-11T21:21:18.6391329Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeOutput.log 2023-01-11T21:21:18.6391866Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/ 2023-01-11T21:21:18.6392387Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2023-01-11T21:21:18.6392956Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/ 2023-01-11T21:21:18.6393507Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2023-01-11T21:21:18.6395518Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2023-01-11T21:21:18.6396739Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2023-01-11T21:21:18.6397304Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2023-01-11T21:21:18.6397863Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2023-01-11T21:21:18.6400293Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2023-01-11T21:21:18.6401460Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2023-01-11T21:21:18.6403226Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2023-01-11T21:21:18.6403846Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2023-01-11T21:21:18.6405274Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2023-01-11T21:21:18.6406323Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2023-01-11T21:21:18.6406907Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2023-01-11T21:21:18.6407466Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2023-01-11T21:21:18.6461785Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2023-01-11T21:21:18.6462513Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2023-01-11T21:21:18.6463238Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2023-01-11T21:21:18.6463974Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2023-01-11T21:21:18.6464697Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2023-01-11T21:21:18.6465388Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2023-01-11T21:21:18.6466053Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2023-01-11T21:21:18.6466737Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2023-01-11T21:21:18.6467689Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2023-01-11T21:21:18.6509819Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2023-01-11T21:21:18.6550904Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2023-01-11T21:21:18.6552077Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2023-01-11T21:21:18.6553115Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2023-01-11T21:21:18.6553795Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2023-01-11T21:21:18.6554435Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2023-01-11T21:21:18.6555228Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2023-01-11T21:21:18.6556075Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2023-01-11T21:21:18.6557921Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2023-01-11T21:21:18.6631309Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2023-01-11T21:21:18.6704491Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2023-01-11T21:21:18.6705357Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2023-01-11T21:21:18.6705922Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2023-01-11T21:21:18.6706627Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeError.log 2023-01-11T21:21:18.6707197Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2023-01-11T21:21:18.6707745Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2023-01-11T21:21:18.6708317Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2023-01-11T21:21:18.6709093Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2023-01-11T21:21:18.6709694Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2023-01-11T21:21:18.6710276Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2023-01-11T21:21:18.6710848Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2023-01-11T21:21:18.6711605Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2023-01-11T21:21:18.6712226Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2023-01-11T21:21:18.6712826Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2023-01-11T21:21:18.6713401Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2023-01-11T21:21:18.6734509Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2023-01-11T21:21:18.6801485Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2023-01-11T21:21:18.6802117Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2023-01-11T21:21:18.6802703Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2023-01-11T21:21:18.6803265Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2023-01-11T21:21:18.6804064Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2023-01-11T21:21:18.6805094Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2023-01-11T21:21:18.6805610Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2023-01-11T21:21:18.6808501Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2023-01-11T21:21:18.6809365Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2023-01-11T21:21:18.6810015Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2023-01-11T21:21:18.6861447Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2023-01-11T21:21:18.6861931Z creating: build/custom_test_artifacts/custom-backend-build/ 2023-01-11T21:21:18.6862450Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2023-01-11T21:21:18.6869160Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeOutput.log 2023-01-11T21:21:18.6869713Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/ 2023-01-11T21:21:18.6870287Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2023-01-11T21:21:18.6870873Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/ 2023-01-11T21:21:18.6871456Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2023-01-11T21:21:18.6873479Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2023-01-11T21:21:18.6875507Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2023-01-11T21:21:18.6876103Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2023-01-11T21:21:18.6876686Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2023-01-11T21:21:18.6878756Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2023-01-11T21:21:18.6879870Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2023-01-11T21:21:18.6881643Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2023-01-11T21:21:18.6882414Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2023-01-11T21:21:18.6883657Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2023-01-11T21:21:18.6884723Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2023-01-11T21:21:18.6885338Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2023-01-11T21:21:18.6885932Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2023-01-11T21:21:18.6940174Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2023-01-11T21:21:18.6940934Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2023-01-11T21:21:18.6941684Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2023-01-11T21:21:18.6942447Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2023-01-11T21:21:18.6943196Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2023-01-11T21:21:18.6943897Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2023-01-11T21:21:18.6944742Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2023-01-11T21:21:18.6945489Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2023-01-11T21:21:18.6946212Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2023-01-11T21:21:18.6987979Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2023-01-11T21:21:18.7029259Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2023-01-11T21:21:18.7030325Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2023-01-11T21:21:18.7031228Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2023-01-11T21:21:18.7031908Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2023-01-11T21:21:18.7032552Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2023-01-11T21:21:18.7033360Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2023-01-11T21:21:18.7034281Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2023-01-11T21:21:18.7036266Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2023-01-11T21:21:18.7109665Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2023-01-11T21:21:18.7182772Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2023-01-11T21:21:18.7183446Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2023-01-11T21:21:18.7184034Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2023-01-11T21:21:18.7184819Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeError.log 2023-01-11T21:21:18.7185529Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2023-01-11T21:21:18.7186086Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2023-01-11T21:21:18.7186710Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2023-01-11T21:21:18.7187362Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2023-01-11T21:21:18.7188002Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2023-01-11T21:21:18.7188603Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2023-01-11T21:21:18.7189232Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2023-01-11T21:21:18.7189864Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2023-01-11T21:21:18.7190499Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2023-01-11T21:21:18.7191109Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2023-01-11T21:21:18.7191734Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2023-01-11T21:21:18.7196171Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2023-01-11T21:21:18.7349636Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2023-01-11T21:21:18.7350288Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2023-01-11T21:21:18.7350905Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2023-01-11T21:21:18.7351581Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2023-01-11T21:21:18.7352229Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2023-01-11T21:21:18.7352865Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2023-01-11T21:21:18.7353488Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2023-01-11T21:21:18.7354131Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2023-01-11T21:21:18.7354779Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2023-01-11T21:21:18.7355429Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2023-01-11T21:21:18.7356052Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2023-01-11T21:21:18.7376905Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2023-01-11T21:21:18.7438168Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2023-01-11T21:21:18.7438853Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2023-01-11T21:21:18.7439492Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2023-01-11T21:21:18.7440066Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2023-01-11T21:21:18.7440787Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2023-01-11T21:21:18.7441849Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2023-01-11T21:21:18.7442553Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2023-01-11T21:21:18.7445256Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2023-01-11T21:21:18.7445970Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2023-01-11T21:21:18.7446711Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2023-01-11T21:21:18.7569379Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2023-01-11T21:21:18.7617107Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2023-01-11T21:21:18.7617476Z creating: build/lib/ 2023-01-11T21:21:18.7618099Z inflating: build/lib/libclog.a 2023-01-11T21:21:18.7687678Z inflating: build/lib/libgtest.a 2023-01-11T21:21:18.7698169Z inflating: build/lib/libpthreadpool.a 2023-01-11T21:21:18.7794473Z inflating: build/lib/libbenchmark.a 2023-01-11T21:21:18.7803565Z inflating: build/lib/libittnotify.a 2023-01-11T21:21:18.7909628Z inflating: build/lib/libprotobuf-lite.a 2023-01-11T21:21:18.7941479Z inflating: build/lib/libtensorpipe_uv.a 2023-01-11T21:21:18.8017999Z inflating: build/lib/libasmjit.a 2023-01-11T21:21:18.8550980Z inflating: build/lib/libprotobuf.a 2023-01-11T21:21:18.8691491Z inflating: build/lib/libgloo.a 2023-01-11T21:21:18.8724191Z inflating: build/lib/libfmt.a 2023-01-11T21:21:18.8724760Z inflating: build/lib/libfoxi_loader.a 2023-01-11T21:21:18.8726939Z inflating: build/lib/libcaffe2_nvrtc.so 2023-01-11T21:21:18.8809255Z inflating: build/lib/libc10.so 2023-01-11T21:21:18.8811417Z inflating: build/lib/libtorch_global_deps.so 2023-01-11T21:21:18.9380379Z inflating: build/lib/libprotoc.a 2023-01-11T21:21:18.9390045Z inflating: build/lib/libcpuinfo.a 2023-01-11T21:21:18.9392616Z inflating: build/lib/libnnpack_reference_layers.a 2023-01-11T21:21:18.9401437Z inflating: build/lib/libcpuinfo_internals.a 2023-01-11T21:21:18.9419249Z inflating: build/lib/libgmock.a 2023-01-11T21:21:18.9419809Z inflating: build/lib/libgtest_main.a 2023-01-11T21:21:18.9420741Z inflating: build/lib/libbenchmark_main.a 2023-01-11T21:21:19.0075357Z inflating: build/lib/libtensorpipe.a 2023-01-11T21:21:19.9844358Z inflating: build/lib/libdnnl.a 2023-01-11T21:21:19.9985276Z inflating: build/lib/libXNNPACK.a 2023-01-11T21:21:20.0039855Z inflating: build/lib/libc10_cuda.so 2023-01-11T21:21:20.0055928Z inflating: build/lib/libqnnpack.a 2023-01-11T21:21:20.0056595Z inflating: build/lib/libgmock_main.a 2023-01-11T21:21:20.1598220Z inflating: build/lib/libfbgemm.a 2023-01-11T21:21:20.1621301Z inflating: build/lib/libpytorch_qnnpack.a 2023-01-11T21:21:20.2777701Z inflating: build/lib/libdnnl_graph.a 2023-01-11T21:21:20.3293865Z inflating: build/lib/libkineto.a 2023-01-11T21:21:20.3580494Z inflating: build/lib/libtensorpipe_cuda.a 2023-01-11T21:21:20.3625095Z inflating: build/lib/libcaffe2_protos.a 2023-01-11T21:21:20.3646783Z inflating: build/lib/libnnpack.a 2023-01-11T21:21:20.3694539Z inflating: build/lib/libonnx_proto.a 2023-01-11T21:21:20.4361614Z inflating: build/lib/libonnx.a 2023-01-11T21:21:20.4793924Z inflating: build/lib/libgloo_cuda.a 2023-01-11T21:21:22.8603153Z inflating: build/lib/libtorch_cpu.so 2023-01-11T21:21:22.8613928Z inflating: build/lib/libunbox_lib.a 2023-01-11T21:21:24.9856767Z inflating: build/lib/libtorch_cuda.so 2023-01-11T21:21:24.9857537Z inflating: build/lib/libtorch.so 2023-01-11T21:21:24.9860566Z inflating: build/lib/libc10d_cuda_test.so 2023-01-11T21:21:25.9742006Z inflating: build/lib/libtorch_cuda_linalg.so 2023-01-11T21:21:25.9802934Z inflating: build/lib/libtorchbind_test.so 2023-01-11T21:21:25.9827313Z inflating: build/lib/libjitbackend_test.so 2023-01-11T21:21:25.9858294Z inflating: build/lib/libbackend_with_compiler.so 2023-01-11T21:21:25.9863265Z inflating: build/lib/libshm.so 2023-01-11T21:21:26.1715344Z inflating: build/lib/libtorch_python.so 2023-01-11T21:21:26.1755370Z inflating: build/lib/libnnapi_backend.so 2023-01-11T21:21:26.1755661Z creating: build/bin/ 2023-01-11T21:21:26.1809960Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2023-01-11T21:21:26.1867589Z inflating: build/bin/c10_DeviceGuard_test 2023-01-11T21:21:26.1923444Z inflating: build/bin/c10_Device_test 2023-01-11T21:21:26.1988169Z inflating: build/bin/c10_DispatchKeySet_test 2023-01-11T21:21:26.2041345Z inflating: build/bin/c10_StreamGuard_test 2023-01-11T21:21:26.2096098Z inflating: build/bin/c10_SymInt_test 2023-01-11T21:21:26.2158226Z inflating: build/bin/c10_InlineDeviceGuard_test 2023-01-11T21:21:26.2220563Z inflating: build/bin/c10_InlineStreamGuard_test 2023-01-11T21:21:26.2283473Z inflating: build/bin/c10_SizesAndStrides_test 2023-01-11T21:21:26.2336765Z inflating: build/bin/c10_Array_test 2023-01-11T21:21:26.2395629Z inflating: build/bin/c10_Bitset_test 2023-01-11T21:21:26.2452313Z inflating: build/bin/c10_C++17_test 2023-01-11T21:21:26.2505771Z inflating: build/bin/c10_ConstexprCrc_test 2023-01-11T21:21:26.2560111Z inflating: build/bin/c10_DeadlockDetection_test 2023-01-11T21:21:26.2615054Z inflating: build/bin/c10_Half_test 2023-01-11T21:21:26.2677732Z inflating: build/bin/c10_LeftRight_test 2023-01-11T21:21:26.2746814Z inflating: build/bin/c10_Metaprogramming_test 2023-01-11T21:21:26.2802114Z inflating: build/bin/c10_Synchronized_test 2023-01-11T21:21:26.2962603Z inflating: build/bin/c10_SmallVectorTest 2023-01-11T21:21:26.3025431Z inflating: build/bin/c10_ThreadLocal_test 2023-01-11T21:21:26.3083960Z inflating: build/bin/c10_TypeIndex_test 2023-01-11T21:21:26.3139793Z inflating: build/bin/c10_TypeList_test 2023-01-11T21:21:26.3193092Z inflating: build/bin/c10_TypeTraits_test 2023-01-11T21:21:26.3251314Z inflating: build/bin/c10_accumulate_test 2023-01-11T21:21:26.3313570Z inflating: build/bin/c10_bfloat16_test 2023-01-11T21:21:26.3375073Z inflating: build/bin/c10_complex_math_test 2023-01-11T21:21:26.3435950Z inflating: build/bin/c10_complex_test 2023-01-11T21:21:26.3555494Z inflating: build/bin/c10_either_test 2023-01-11T21:21:26.3614016Z inflating: build/bin/c10_exception_test 2023-01-11T21:21:26.3669300Z inflating: build/bin/c10_flags_test 2023-01-11T21:21:26.3853748Z inflating: build/bin/c10_intrusive_ptr_test 2023-01-11T21:21:26.3909630Z inflating: build/bin/c10_irange_test 2023-01-11T21:21:26.3972644Z inflating: build/bin/c10_logging_test 2023-01-11T21:21:26.4053935Z inflating: build/bin/c10_optional_test 2023-01-11T21:21:26.4122250Z inflating: build/bin/c10_ordered_preserving_dict_test 2023-01-11T21:21:26.4183189Z inflating: build/bin/c10_registry_test 2023-01-11T21:21:26.4247217Z inflating: build/bin/c10_string_view_test 2023-01-11T21:21:26.4304546Z inflating: build/bin/c10_tempfile_test 2023-01-11T21:21:26.4366181Z inflating: build/bin/c10_typeid_test 2023-01-11T21:21:26.4427305Z inflating: build/bin/c10_intrusive_ptr_benchmark 2023-01-11T21:21:26.4948790Z inflating: build/bin/protoc-3.13.0.0 2023-01-11T21:21:26.5469693Z inflating: build/bin/protoc 2023-01-11T21:21:26.5528875Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream 2023-01-11T21:21:26.5587951Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test 2023-01-11T21:21:26.5646593Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device 2023-01-11T21:21:26.5704513Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes 2023-01-11T21:21:26.5763396Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads 2023-01-11T21:21:26.5822557Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks 2023-01-11T21:21:26.5876175Z inflating: build/bin/c10_cuda_CUDATest 2023-01-11T21:21:26.5934922Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block 2023-01-11T21:21:26.6258365Z inflating: build/bin/vec_test_all_types_DEFAULT 2023-01-11T21:21:26.6618099Z inflating: build/bin/vec_test_all_types_AVX2 2023-01-11T21:21:26.6677828Z inflating: build/bin/HashStoreTest 2023-01-11T21:21:26.6743945Z inflating: build/bin/TCPStoreTest 2023-01-11T21:21:26.6803583Z inflating: build/bin/FileStoreTest 2023-01-11T21:21:26.6819935Z inflating: build/bin/ProcessGroupMPITest 2023-01-11T21:21:26.6882847Z inflating: build/bin/test_edge_op_registration 2023-01-11T21:21:26.6886356Z inflating: build/bin/example_allreduce 2023-01-11T21:21:26.6945181Z inflating: build/bin/Dimname_test 2023-01-11T21:21:26.7025533Z inflating: build/bin/Dict_test 2023-01-11T21:21:26.7096304Z inflating: build/bin/MaybeOwned_test 2023-01-11T21:21:26.7159233Z inflating: build/bin/NamedTensor_test 2023-01-11T21:21:26.7224885Z inflating: build/bin/apply_utils_test 2023-01-11T21:21:26.7292091Z inflating: build/bin/basic 2023-01-11T21:21:26.7357018Z inflating: build/bin/atest 2023-01-11T21:21:26.7417054Z inflating: build/bin/broadcast_test 2023-01-11T21:21:26.7480957Z inflating: build/bin/cpu_generator_test 2023-01-11T21:21:26.7539271Z inflating: build/bin/cpu_profiling_allocator_test 2023-01-11T21:21:26.7636105Z inflating: build/bin/cpu_rng_test 2023-01-11T21:21:26.7691318Z inflating: build/bin/dispatch_key_set_test 2023-01-11T21:21:26.7746975Z inflating: build/bin/dlconvertor_test 2023-01-11T21:21:26.7811287Z inflating: build/bin/extension_backend_test 2023-01-11T21:21:26.7873008Z inflating: build/bin/half_test 2023-01-11T21:21:26.7927245Z inflating: build/bin/lazy_tensor_test 2023-01-11T21:21:26.7987364Z inflating: build/bin/math_kernel_test 2023-01-11T21:21:26.8090948Z inflating: build/bin/ivalue_test 2023-01-11T21:21:26.8150787Z inflating: build/bin/memory_format_test 2023-01-11T21:21:26.8210291Z inflating: build/bin/memory_overlapping_test 2023-01-11T21:21:26.8266508Z inflating: build/bin/operator_name_test 2023-01-11T21:21:26.8324683Z inflating: build/bin/mobile_memory_cleanup 2023-01-11T21:21:26.8386251Z inflating: build/bin/native_test 2023-01-11T21:21:26.8441778Z inflating: build/bin/operators_test 2023-01-11T21:21:26.8500421Z inflating: build/bin/packedtensoraccessor_test 2023-01-11T21:21:26.8564093Z inflating: build/bin/quantized_test 2023-01-11T21:21:26.8636878Z inflating: build/bin/pow_test 2023-01-11T21:21:26.8691241Z inflating: build/bin/reduce_ops_test 2023-01-11T21:21:26.8747772Z inflating: build/bin/reportMemoryUsage_test 2023-01-11T21:21:26.8810105Z inflating: build/bin/scalar_tensor_test 2023-01-11T21:21:26.8873417Z inflating: build/bin/scalar_test 2023-01-11T21:21:26.8931542Z inflating: build/bin/stride_properties_test 2023-01-11T21:21:26.9018156Z inflating: build/bin/tensor_iterator_test 2023-01-11T21:21:26.9079019Z inflating: build/bin/type_ptr_test 2023-01-11T21:21:26.9082377Z inflating: build/bin/thread_init_test 2023-01-11T21:21:26.9144338Z inflating: build/bin/test_parallel 2023-01-11T21:21:26.9198866Z inflating: build/bin/variant_test 2023-01-11T21:21:26.9257532Z inflating: build/bin/undefined_tensor_test 2023-01-11T21:21:26.9324444Z inflating: build/bin/type_test 2023-01-11T21:21:26.9325909Z inflating: build/bin/verify_api_visibility 2023-01-11T21:21:26.9402115Z inflating: build/bin/legacy_vmap_test 2023-01-11T21:21:26.9458802Z inflating: build/bin/weakref_test 2023-01-11T21:21:26.9515250Z inflating: build/bin/wrapdim_test 2023-01-11T21:21:26.9634872Z inflating: build/bin/List_test 2023-01-11T21:21:26.9701146Z inflating: build/bin/IListRef_test 2023-01-11T21:21:26.9755199Z inflating: build/bin/xla_tensor_test 2023-01-11T21:21:26.9888788Z inflating: build/bin/kernel_function_legacy_test 2023-01-11T21:21:26.9960547Z inflating: build/bin/KernelFunction_test 2023-01-11T21:21:27.0066132Z inflating: build/bin/kernel_function_test 2023-01-11T21:21:27.0206966Z inflating: build/bin/kernel_lambda_legacy_test 2023-01-11T21:21:27.0273768Z inflating: build/bin/kernel_stackbased_test 2023-01-11T21:21:27.0387263Z inflating: build/bin/kernel_lambda_test 2023-01-11T21:21:27.0443725Z inflating: build/bin/CppSignature_test 2023-01-11T21:21:27.0548799Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2023-01-11T21:21:27.0601872Z inflating: build/bin/op_allowlist_test 2023-01-11T21:21:27.0662165Z inflating: build/bin/inline_container_test 2023-01-11T21:21:27.0724129Z inflating: build/bin/backend_fallback_test 2023-01-11T21:21:27.1039243Z inflating: build/bin/op_registration_test 2023-01-11T21:21:27.1097646Z inflating: build/bin/cuda_apply_test 2023-01-11T21:21:27.1175901Z inflating: build/bin/cuda_complex_math_test 2023-01-11T21:21:27.1230167Z inflating: build/bin/cuda_device_test 2023-01-11T21:21:27.1289847Z inflating: build/bin/cuda_caching_host_allocator_test 2023-01-11T21:21:27.1356006Z inflating: build/bin/cuda_atomic_ops_test 2023-01-11T21:21:27.1411310Z inflating: build/bin/cuda_dlconvertor_test 2023-01-11T21:21:27.1475862Z inflating: build/bin/cuda_complex_test 2023-01-11T21:21:27.1541329Z inflating: build/bin/cuda_cub_test 2023-01-11T21:21:27.1597028Z inflating: build/bin/cuda_integer_divider_test 2023-01-11T21:21:27.1671227Z inflating: build/bin/cuda_distributions_test 2023-01-11T21:21:27.1730002Z inflating: build/bin/cuda_reportMemoryUsage_test 2023-01-11T21:21:27.1796973Z inflating: build/bin/cuda_stream_test 2023-01-11T21:21:27.1862131Z inflating: build/bin/cuda_generator_test 2023-01-11T21:21:27.1915834Z inflating: build/bin/cuda_optional_test 2023-01-11T21:21:27.1970099Z inflating: build/bin/cuda_half_test 2023-01-11T21:21:27.2027032Z inflating: build/bin/cuda_packedtensoraccessor_test 2023-01-11T21:21:27.2044908Z inflating: build/bin/tutorial_tensorexpr 2023-01-11T21:21:27.2117087Z inflating: build/bin/ProcessGroupGlooTest 2023-01-11T21:21:27.2171916Z inflating: build/bin/cuda_cudnn_test 2023-01-11T21:21:27.2230879Z inflating: build/bin/ProcessGroupUCCTest 2023-01-11T21:21:27.2290551Z inflating: build/bin/test_dist_autograd 2023-01-11T21:21:27.2355088Z inflating: build/bin/ProcessGroupGlooAsyncTest 2023-01-11T21:21:27.2419283Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2023-01-11T21:21:27.2487147Z inflating: build/bin/ProcessGroupNCCLTest 2023-01-11T21:21:27.2564037Z inflating: build/bin/test_cpp_rpc 2023-01-11T21:21:27.2566755Z inflating: build/bin/parallel_benchmark 2023-01-11T21:21:27.2642275Z inflating: build/bin/test_mobile_nnc 2023-01-11T21:21:27.2653855Z inflating: build/bin/aot_model_compiler_test 2023-01-11T21:21:27.2711772Z inflating: build/bin/cuda_vectorized_test 2023-01-11T21:21:27.2717726Z inflating: build/bin/torch_shm_manager 2023-01-11T21:21:27.3106549Z inflating: build/bin/test_lazy 2023-01-11T21:21:27.4016135Z inflating: build/bin/test_tensorexpr 2023-01-11T21:21:27.5334898Z inflating: build/bin/test_api 2023-01-11T21:21:27.6537850Z inflating: build/bin/test_jit 2023-01-11T21:21:27.6540877Z inflating: .pytorch-test-times.json 2023-01-11T21:21:27.6570549Z ##[group]Run df -H 2023-01-11T21:21:27.6570786Z df -H 2023-01-11T21:21:27.6584856Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T21:21:27.6585156Z env: 2023-01-11T21:21:27.6585397Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:21:27.6585651Z GPU_FLAG: --gpus all 2023-01-11T21:21:27.6585898Z ##[endgroup] 2023-01-11T21:21:27.6625710Z Filesystem Size Used Avail Use% Mounted on 2023-01-11T21:21:27.6626056Z devtmpfs 129G 0 129G 0% /dev 2023-01-11T21:21:27.6626354Z tmpfs 129G 25k 129G 1% /dev/shm 2023-01-11T21:21:27.6626620Z tmpfs 129G 553k 129G 1% /run 2023-01-11T21:21:27.6627283Z tmpfs 129G 0 129G 0% /sys/fs/cgroup 2023-01-11T21:21:27.6627618Z /dev/xvda1 162G 27G 135G 17% / 2023-01-11T21:21:27.6649778Z ##[group]Run .github/scripts/parse_ref.py 2023-01-11T21:21:27.6650147Z .github/scripts/parse_ref.py 2023-01-11T21:21:27.6662175Z shell: /usr/bin/bash -e {0} 2023-01-11T21:21:27.6662429Z env: 2023-01-11T21:21:27.6662691Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:21:27.6662976Z GPU_FLAG: --gpus all 2023-01-11T21:21:27.6663234Z ##[endgroup] 2023-01-11T21:21:27.6957136Z ##[group]Run set -x 2023-01-11T21:21:27.6957520Z set -x 2023-01-11T21:21:27.6957754Z  2023-01-11T21:21:27.6958015Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2023-01-11T21:21:27.6958370Z  TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh 2023-01-11T21:21:27.6958726Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2023-01-11T21:21:27.6959054Z  TEST_COMMAND=.jenkins/onnx/test.sh 2023-01-11T21:21:27.6959309Z else 2023-01-11T21:21:27.6959603Z  TEST_COMMAND=.jenkins/pytorch/test.sh 2023-01-11T21:21:27.6959881Z fi 2023-01-11T21:21:27.6960089Z  2023-01-11T21:21:27.6960410Z COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") 2023-01-11T21:21:27.6960730Z  2023-01-11T21:21:27.6961013Z # sanitize the input commit message and PR body here: 2023-01-11T21:21:27.6961309Z # 2023-01-11T21:21:27.6961691Z # trim all new lines from commit messages + PR_BODY to avoid issues with batch environment 2023-01-11T21:21:27.6962184Z # variable copying. see https://github.com/pytorch/pytorch/pull/80043#issuecomment-1167796028 2023-01-11T21:21:27.6962645Z COMMIT_MESSAGES="${COMMIT_MESSAGES//[$'\n\r']}" 2023-01-11T21:21:27.6962965Z PR_BODY="${PR_BODY//[$'\n\r']}" 2023-01-11T21:21:27.6963226Z  2023-01-11T21:21:27.6963566Z # then trim all special characters like single and double quotes to avoid unescaped inputs to 2023-01-11T21:21:27.6963951Z # wreak havoc internally 2023-01-11T21:21:27.6964281Z export COMMIT_MESSAGES="${COMMIT_MESSAGES//[\'\"]}" 2023-01-11T21:21:27.6964599Z export PR_BODY="${PR_BODY//[\'\"]}" 2023-01-11T21:21:27.6964864Z  2023-01-11T21:21:27.6965175Z # detached container should get cleaned up by teardown_ec2_linux 2023-01-11T21:21:27.6965564Z # TODO: Stop building test binaries as part of the build phase 2023-01-11T21:21:27.6965944Z # Used for GPU_FLAG since that doesn't play nice 2023-01-11T21:21:27.6966277Z # shellcheck disable=SC2086,SC2090 2023-01-11T21:21:27.6966584Z container_name=$(docker run \ 2023-01-11T21:21:27.6966847Z  ${GPU_FLAG:-} \ 2023-01-11T21:21:27.6967125Z  -e BUILD_ENVIRONMENT \ 2023-01-11T21:21:27.6967409Z  -e PR_NUMBER \ 2023-01-11T21:21:27.6967662Z  -e GITHUB_ACTIONS \ 2023-01-11T21:21:27.6967929Z  -e BASE_SHA \ 2023-01-11T21:21:27.6968190Z  -e BRANCH \ 2023-01-11T21:21:27.6968422Z  -e SHA1 \ 2023-01-11T21:21:27.6968685Z  -e AWS_DEFAULT_REGION \ 2023-01-11T21:21:27.6968963Z  -e IN_WHEEL_TEST \ 2023-01-11T21:21:27.6969211Z  -e SHARD_NUMBER \ 2023-01-11T21:21:27.6969481Z  -e TEST_CONFIG \ 2023-01-11T21:21:27.6969751Z  -e NUM_TEST_SHARDS \ 2023-01-11T21:21:27.6970013Z  -e PR_BODY \ 2023-01-11T21:21:27.6970264Z  -e COMMIT_MESSAGES \ 2023-01-11T21:21:27.6970552Z  -e CONTINUE_THROUGH_ERROR \ 2023-01-11T21:21:27.6970858Z  -e PYTORCH_RETRY_TEST_CASES \ 2023-01-11T21:21:27.6971158Z  -e PYTORCH_OVERRIDE_FLAKY_SIGNAL \ 2023-01-11T21:21:27.6971448Z  -e PR_LABELS \ 2023-01-11T21:21:27.6971741Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2023-01-11T21:21:27.6972018Z  -e SCCACHE_BUCKET \ 2023-01-11T21:21:27.6972300Z  -e SCCACHE_S3_KEY_PREFIX \ 2023-01-11T21:21:27.6972699Z  -e XLA_CUDA \ 2023-01-11T21:21:27.6973189Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2023-01-11T21:21:27.6973519Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2023-01-11T21:21:27.6973854Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2023-01-11T21:21:27.6974209Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2023-01-11T21:21:27.6974518Z  --ulimit stack=10485760:83886080 \ 2023-01-11T21:21:27.6974925Z  --security-opt seccomp=unconfined \ 2023-01-11T21:21:27.6975251Z  --cap-add=SYS_PTRACE \ 2023-01-11T21:21:27.6975509Z  --ipc=host \ 2023-01-11T21:21:27.6975783Z  --shm-size="${SHM_SIZE}" \ 2023-01-11T21:21:27.6976047Z  --tty \ 2023-01-11T21:21:27.6976274Z  --detach \ 2023-01-11T21:21:27.6976546Z  --name="${container_name}" \ 2023-01-11T21:21:27.6976827Z  --user jenkins \ 2023-01-11T21:21:27.6977134Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2023-01-11T21:21:27.6977490Z  -w /var/lib/jenkins/workspace \ 2023-01-11T21:21:27.6977777Z  "${DOCKER_IMAGE}" 2023-01-11T21:21:27.6978023Z ) 2023-01-11T21:21:27.6978309Z echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}" 2023-01-11T21:21:27.6978757Z docker exec -t "${container_name}" sh -c "pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}" 2023-01-11T21:21:27.6991076Z shell: /usr/bin/bash -e {0} 2023-01-11T21:21:27.6991312Z env: 2023-01-11T21:21:27.6991555Z GIT_DEFAULT_BRANCH: master 2023-01-11T21:21:27.6991829Z GPU_FLAG: --gpus all 2023-01-11T21:21:27.6992144Z BUILD_ENVIRONMENT: linux-bionic-cuda11.7-py3.10-gcc7 2023-01-11T21:21:27.6992458Z PR_NUMBER: 2023-01-11T21:21:27.6992696Z BRANCH: 2023-01-11T21:21:27.6992962Z SHA1: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:21:27.6993306Z BASE_SHA: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:21:27.6993626Z PYTORCH_RETRY_TEST_CASES: 1 2023-01-11T21:21:27.6993899Z PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1 2023-01-11T21:21:27.6994182Z TEST_CONFIG: distributed 2023-01-11T21:21:27.6994440Z SHARD_NUMBER: 3 2023-01-11T21:21:27.6994667Z NUM_TEST_SHARDS: 3 2023-01-11T21:21:27.6994908Z PR_BODY: 2023-01-11T21:21:27.6995165Z CONTINUE_THROUGH_ERROR: False 2023-01-11T21:21:27.6995503Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2023-01-11T21:21:27.6995816Z SCCACHE_S3_KEY_PREFIX: trunk 2023-01-11T21:21:27.6996077Z SHM_SIZE: 2g 2023-01-11T21:21:27.6996574Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:21:27.6997038Z XLA_CUDA: 2023-01-11T21:21:27.6997390Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2023-01-11T21:21:27.6997773Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 0 2023-01-11T21:21:27.6998061Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2023-01-11T21:21:27.6998341Z ##[endgroup] 2023-01-11T21:21:27.7026472Z + [[ distributed == \m\u\l\t\i\g\p\u ]] 2023-01-11T21:21:27.7026942Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *onnx* ]] 2023-01-11T21:21:27.7027285Z + TEST_COMMAND=.jenkins/pytorch/test.sh 2023-01-11T21:21:27.7030487Z ++ git cherry -v origin/master 2023-01-11T21:21:27.7568230Z + COMMIT_MESSAGES='+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into '\''input'\'' 2023-01-11T21:21:27.7568736Z + 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch 2023-01-11T21:21:27.7569314Z + 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e '\''other'\'' instead of '\''output'\'' in documentation' 2023-01-11T21:21:27.7570705Z + COMMIT_MESSAGES='+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into '\''input'\''+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e '\''other'\'' instead of '\''output'\'' in documentation' 2023-01-11T21:21:27.7571528Z + PR_BODY= 2023-01-11T21:21:27.7573247Z + export 'COMMIT_MESSAGES=+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation' 2023-01-11T21:21:27.7574592Z + COMMIT_MESSAGES='+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation' 2023-01-11T21:21:27.7575202Z + export PR_BODY= 2023-01-11T21:21:27.7575442Z + PR_BODY= 2023-01-11T21:21:27.7582373Z +++ nproc --ignore=2 2023-01-11T21:21:27.7595118Z ++ docker run --gpus all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e PR_BODY -e COMMIT_MESSAGES -e CONTINUE_THROUGH_ERROR -e PYTORCH_RETRY_TEST_CASES -e PYTORCH_OVERRIDE_FLAKY_SIGNAL -e PR_LABELS -e MAX_JOBS=30 -e SCCACHE_BUCKET -e SCCACHE_S3_KEY_PREFIX -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS --env-file=/tmp/github_env_3896346758 --ulimit stack=10485760:83886080 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T21:21:41.9286522Z + container_name=7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T21:21:41.9287275Z + echo DOCKER_CONTAINER_ID=7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T21:21:41.9291939Z ++ echo dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl 2023-01-11T21:21:41.9294472Z + docker exec -t 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 sh -c 'pip install dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl[opt-einsum] && .jenkins/pytorch/test.sh' 2023-01-11T21:21:42.4988170Z Processing ./dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl 2023-01-11T21:21:43.4563644Z Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (2.6.3) 2023-01-11T21:21:43.4566536Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (4.4.0) 2023-01-11T21:21:43.4573253Z Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (1.11.1) 2023-01-11T21:21:43.4591709Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (3.3.0) 2023-01-11T21:21:43.4671313Z Requirement already satisfied: numpy>=1.7 in /opt/conda/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.0.0a0+git8419ddd) (1.21.2) 2023-01-11T21:21:43.4888488Z Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch==2.0.0a0+git8419ddd) (1.2.1) 2023-01-11T21:21:44.4189582Z Installing collected packages: torch 2023-01-11T21:21:54.0230835Z Successfully installed torch-2.0.0a0+git8419ddd 2023-01-11T21:21:54.1807499Z + echo 'Environment variables:' 2023-01-11T21:21:54.1807968Z Environment variables: 2023-01-11T21:21:54.1811635Z + env 2023-01-11T21:21:54.1815669Z SHARD_NUMBER=3 2023-01-11T21:21:54.1816145Z NV_LIBCUBLAS_DEV_VERSION=11.10.1.25-1 2023-01-11T21:21:54.1817444Z NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-7 2023-01-11T21:21:54.1817899Z LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2023-01-11T21:21:54.1818505Z NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.13.4-1+cuda11.7 2023-01-11T21:21:54.1818860Z UCC_HOME=/usr 2023-01-11T21:21:54.1819545Z BUILD_ENVIRONMENT=linux-bionic-cuda11.7-py3.10-gcc7 2023-01-11T21:21:54.1819957Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2023-01-11T21:21:54.1820647Z NV_LIBNPP_DEV_PACKAGE=libnpp-dev-11-7=11.7.3.21-1 2023-01-11T21:21:54.1821000Z INSTALLED_DB=yes 2023-01-11T21:21:54.1821797Z HOSTNAME=7e0e28e30a97 2023-01-11T21:21:54.1824379Z GITHUB_REF_NAME=ciflow/trunk/91627 2023-01-11T21:21:54.1824802Z GITHUB_API_URL=https://api.github.com 2023-01-11T21:21:54.1825367Z GITHUB_REPOSITORY_OWNER_ID=21003710 2023-01-11T21:21:54.1825911Z OPENSSL_DIR=/opt/openssl 2023-01-11T21:21:54.1826339Z UCC_COMMIT=1c7a7127186e7836f73aafbd7697bbc274a77eee 2023-01-11T21:21:54.1827129Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_08636d76-1c9e-49f4-ae4f-10de4734a716 2023-01-11T21:21:54.1827848Z CUDA_PATH=/usr/local/cuda 2023-01-11T21:21:54.1828397Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2023-01-11T21:21:54.1828901Z GITHUB_RUN_ATTEMPT=1 2023-01-11T21:21:54.1829553Z TEST_CONFIG=distributed 2023-01-11T21:21:54.1829942Z NV_LIBNPP_VERSION=11.7.3.21-1 2023-01-11T21:21:54.1830343Z NV_NVPROF_DEV_PACKAGE=cuda-nvprof-11-7=11.7.50-1 2023-01-11T21:21:54.1830746Z GITHUB_REPOSITORY_OWNER=pytorch 2023-01-11T21:21:54.1831094Z GITHUB_ACTIONS=true 2023-01-11T21:21:54.1831374Z NVIDIA_VISIBLE_DEVICES=all 2023-01-11T21:21:54.1831744Z NV_NVPROF_VERSION=11.7.50-1 2023-01-11T21:21:54.1832133Z NV_LIBCUSPARSE_VERSION=11.7.3.50-1 2023-01-11T21:21:54.1832556Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/tags/ciflow/trunk/91627 2023-01-11T21:21:54.1833001Z NVIDIA_PRODUCT_NAME=CUDA 2023-01-11T21:21:54.1833850Z CI=true 2023-01-11T21:21:54.1834666Z PYTORCH_OVERRIDE_FLAKY_SIGNAL=1 2023-01-11T21:21:54.1835177Z NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-11-7=11.10.1.25-1 2023-01-11T21:21:54.1835543Z BRANCH= 2023-01-11T21:21:54.1836356Z GITHUB_HEAD_REF= 2023-01-11T21:21:54.1884610Z UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab 2023-01-11T21:21:54.1885107Z GITHUB_ACTOR=pytorch-bot[bot] 2023-01-11T21:21:54.1885443Z CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache 2023-01-11T21:21:54.1885781Z GITHUB_ACTION_REF= 2023-01-11T21:21:54.1886045Z NCCL_VERSION=2.13.4-1 2023-01-11T21:21:54.1886309Z GITHUB_ACTION=__self 2023-01-11T21:21:54.1886566Z GITHUB_REF_PROTECTED=false 2023-01-11T21:21:54.1887009Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2023-01-11T21:21:54.1887427Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2023-01-11T21:21:54.1888047Z *** 2023-01-11T21:21:54.1888291Z INSTALLED_VISION=yes 2023-01-11T21:21:54.1888512Z NVARCH=x86_64 2023-01-11T21:21:54.1888807Z NV_LIBCUSPARSE_DEV_VERSION=11.7.3.50-1 2023-01-11T21:21:54.1889067Z HOME=/var/lib/jenkins 2023-01-11T21:21:54.1889578Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_08636d76-1c9e-49f4-ae4f-10de4734a716 2023-01-11T21:21:54.1889974Z CARGO_NET_GIT_FETCH_WITH_CLI=true 2023-01-11T21:21:54.1890248Z NVIDIA_CUDA_END_OF_LIFE=1 2023-01-11T21:21:54.1890510Z GITHUB_ACTION_REPOSITORY= 2023-01-11T21:21:54.1890771Z GITHUB_REF_TYPE=tag 2023-01-11T21:21:54.1891079Z NV_LIBNCCL_PACKAGE_VERSION=2.13.4-1 2023-01-11T21:21:54.1891352Z GITHUB_RETENTION_DAYS=90 2023-01-11T21:21:54.1891739Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2023-01-11T21:21:54.1892154Z NV_LIBNCCL_PACKAGE=libnccl2=2.13.4-1+cuda11.7 2023-01-11T21:21:54.1892707Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_08636d76-1c9e-49f4-ae4f-10de4734a716 2023-01-11T21:21:54.1893751Z DEBIAN_FRONTEND=noninteractive 2023-01-11T21:21:54.1894206Z NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev 2023-01-11T21:21:54.1894524Z GITHUB_REF=refs/tags/ciflow/trunk/91627 2023-01-11T21:21:54.1894822Z NV_CUDA_LIB_VERSION=11.7.0-1 2023-01-11T21:21:54.1895148Z GITHUB_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:21:54.1895459Z INSTALLED_PROTOBUF=yes 2023-01-11T21:21:54.1895738Z GITHUB_REPOSITORY_ID=65600975 2023-01-11T21:21:54.1895992Z GITHUB_RUN_ID=3896346758 2023-01-11T21:21:54.1896549Z NV_LIBNPP_PACKAGE=libnpp-11-7=11.7.3.21-1 2023-01-11T21:21:54.1896858Z NV_LIBNCCL_PACKAGE_NAME=libnccl2 2023-01-11T21:21:54.1897142Z LIBRARY_PATH=/usr/local/cuda/lib64/stubs 2023-01-11T21:21:54.1897456Z NV_NVTX_VERSION=11.7.50-1 2023-01-11T21:21:54.1897729Z CONTINUE_THROUGH_ERROR=False 2023-01-11T21:21:54.1898020Z GITHUB_SERVER_URL=https://github.com 2023-01-11T21:21:54.1898300Z MAX_JOBS=30 2023-01-11T21:21:54.1898551Z GITHUB_ACTOR_ID=54816060 2023-01-11T21:21:54.1898930Z NV_LIBCUBLAS_VERSION=11.10.1.25-1 2023-01-11T21:21:54.1899333Z NV_LIBCUBLAS_PACKAGE=libcublas-11-7=11.10.1.25-1 2023-01-11T21:21:54.1899822Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2023-01-11T21:21:54.1900153Z UCX_HOME=/usr 2023-01-11T21:21:54.1900412Z PYTORCH_RETRY_TEST_CASES=1 2023-01-11T21:21:54.1900748Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2023-01-11T21:21:54.1901090Z BASE_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:21:54.1901442Z NV_CUDA_CUDART_DEV_VERSION=11.7.60-1 2023-01-11T21:21:54.1901718Z PR_BODY= 2023-01-11T21:21:54.1901934Z GITHUB_BASE_REF= 2023-01-11T21:21:54.1902173Z TERM=xterm 2023-01-11T21:21:54.1902398Z XLA_CUDA= 2023-01-11T21:21:54.1902654Z NV_NVML_DEV_VERSION=11.7.50-1 2023-01-11T21:21:54.1902931Z TORCH_CUDA_ARCH_LIST=Maxwell 2023-01-11T21:21:54.1903194Z CUDA_VERSION=11.7.0 2023-01-11T21:21:54.1903518Z NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-7 2023-01-11T21:21:54.1903821Z OPENSSL_ROOT_DIR=/opt/openssl 2023-01-11T21:21:54.1904368Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_08636d76-1c9e-49f4-ae4f-10de4734a716 2023-01-11T21:21:54.1904748Z GITHUB_JOB=test 2023-01-11T21:21:54.1904990Z SCCACHE_S3_KEY_PREFIX=trunk 2023-01-11T21:21:54.1905599Z COMMIT_MESSAGES=+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation 2023-01-11T21:21:54.1906282Z NVIDIA_DRIVER_CAPABILITIES=compute,utility 2023-01-11T21:21:54.1906578Z NUM_TEST_SHARDS=3 2023-01-11T21:21:54.1906802Z PR_NUMBER= 2023-01-11T21:21:54.1907333Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_08636d76-1c9e-49f4-ae4f-10de4734a716 2023-01-11T21:21:54.1907713Z SHLVL=1 2023-01-11T21:21:54.1908044Z NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-11-7 2023-01-11T21:21:54.1908375Z GITHUB_REPOSITORY=pytorch/pytorch 2023-01-11T21:21:54.1909540Z NVIDIA_REQUIRE_CUDA=cuda>=11.7 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=510,driver<511 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511 2023-01-11T21:21:54.1910662Z NV_LIBNPP_DEV_VERSION=11.7.3.21-1 2023-01-11T21:21:54.1910986Z SHA1=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:21:54.1911286Z GITHUB_EVENT_NAME=push 2023-01-11T21:21:54.1911575Z NV_CUDA_CUDART_VERSION=11.7.60-1 2023-01-11T21:21:54.1911929Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2023-01-11T21:21:54.1912223Z GITHUB_RUN_NUMBER=22986 2023-01-11T21:21:54.1912466Z GITHUB_WORKFLOW=trunk 2023-01-11T21:21:54.1912883Z PATH=/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2023-01-11T21:21:54.1913333Z NV_LIBNCCL_DEV_PACKAGE_VERSION=2.13.4-1 2023-01-11T21:21:54.1913669Z GITHUB_WORKFLOW_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:21:54.1914239Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:21:54.1914661Z GITHUB_TRIGGERING_ACTOR=pytorch-bot[bot] 2023-01-11T21:21:54.1914925Z _=/usr/bin/env 2023-01-11T21:21:54.1915306Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2023-01-11T21:21:54.2042598Z + TORCH_INSTALL_DIR=/opt/conda/lib/python3.10/site-packages/torch 2023-01-11T21:21:54.2043210Z + TORCH_BIN_DIR=/opt/conda/lib/python3.10/site-packages/torch/bin 2023-01-11T21:21:54.2043706Z + TORCH_LIB_DIR=/opt/conda/lib/python3.10/site-packages/torch/lib 2023-01-11T21:21:54.2044170Z + TORCH_TEST_DIR=/opt/conda/lib/python3.10/site-packages/torch/test 2023-01-11T21:21:54.2044490Z + BUILD_DIR=build 2023-01-11T21:21:54.2044743Z + BUILD_RENAMED_DIR=build_renamed 2023-01-11T21:21:54.2045022Z + BUILD_BIN_DIR=build/bin 2023-01-11T21:21:54.2045287Z + export VALGRIND=ON 2023-01-11T21:21:54.2045518Z + VALGRIND=ON 2023-01-11T21:21:54.2045786Z + export TORCH_INDUCTOR_INSTALL_GXX=ON 2023-01-11T21:21:54.2046095Z + TORCH_INDUCTOR_INSTALL_GXX=ON 2023-01-11T21:21:54.2046509Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *clang9* ]] 2023-01-11T21:21:54.2046929Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 != *bazel* ]] 2023-01-11T21:21:54.2048904Z ++ realpath build/custom_test_artifacts 2023-01-11T21:21:54.2056960Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2023-01-11T21:21:54.2060005Z ++ dirname .jenkins/pytorch/test.sh 2023-01-11T21:21:54.2066635Z + source .jenkins/pytorch/common.sh 2023-01-11T21:21:54.2070346Z +++ dirname .jenkins/pytorch/common.sh 2023-01-11T21:21:54.2081021Z ++ source .jenkins/pytorch/common_utils.sh 2023-01-11T21:21:54.2083139Z +++ declare -f -t trap_add 2023-01-11T21:21:54.2089332Z ++ set -ex 2023-01-11T21:21:54.2089935Z ++ [[ linux-bionic-cuda11.7-py3.10-gcc7 == *rocm* ]] 2023-01-11T21:21:54.2090248Z ++ BUILD_TEST_LIBTORCH=0 2023-01-11T21:21:54.2090582Z + echo 'Environment variables' 2023-01-11T21:21:54.2090871Z Environment variables 2023-01-11T21:21:54.2091095Z + env 2023-01-11T21:21:54.2098521Z SHARD_NUMBER=3 2023-01-11T21:21:54.2099359Z NV_LIBCUBLAS_DEV_VERSION=11.10.1.25-1 2023-01-11T21:21:54.2100249Z NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-7 2023-01-11T21:21:54.2100999Z LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2023-01-11T21:21:54.2101888Z NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.13.4-1+cuda11.7 2023-01-11T21:21:54.2102497Z UCC_HOME=/usr 2023-01-11T21:21:54.2103208Z BUILD_ENVIRONMENT=linux-bionic-cuda11.7-py3.10-gcc7 2023-01-11T21:21:54.2103882Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2023-01-11T21:21:54.2104692Z NV_LIBNPP_DEV_PACKAGE=libnpp-dev-11-7=11.7.3.21-1 2023-01-11T21:21:54.2105056Z INSTALLED_DB=yes 2023-01-11T21:21:54.2105316Z HOSTNAME=7e0e28e30a97 2023-01-11T21:21:54.2105593Z GITHUB_REF_NAME=ciflow/trunk/91627 2023-01-11T21:21:54.2105903Z GITHUB_API_URL=https://api.github.com 2023-01-11T21:21:54.2106226Z GITHUB_REPOSITORY_OWNER_ID=21003710 2023-01-11T21:21:54.2106516Z OPENSSL_DIR=/opt/openssl 2023-01-11T21:21:54.2106816Z UCC_COMMIT=1c7a7127186e7836f73aafbd7697bbc274a77eee 2023-01-11T21:21:54.2107426Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_08636d76-1c9e-49f4-ae4f-10de4734a716 2023-01-11T21:21:54.2107842Z CUDA_PATH=/usr/local/cuda 2023-01-11T21:21:54.2108501Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2023-01-11T21:21:54.2109204Z GITHUB_RUN_ATTEMPT=1 2023-01-11T21:21:54.2109717Z TEST_CONFIG=distributed 2023-01-11T21:21:54.2110320Z NV_LIBNPP_VERSION=11.7.3.21-1 2023-01-11T21:21:54.2111125Z NV_NVPROF_DEV_PACKAGE=cuda-nvprof-11-7=11.7.50-1 2023-01-11T21:21:54.2111527Z GITHUB_REPOSITORY_OWNER=pytorch 2023-01-11T21:21:54.2111781Z GITHUB_ACTIONS=true 2023-01-11T21:21:54.2112038Z NVIDIA_VISIBLE_DEVICES=all 2023-01-11T21:21:54.2112336Z NV_NVPROF_VERSION=11.7.50-1 2023-01-11T21:21:54.2112634Z NV_LIBCUSPARSE_VERSION=11.7.3.50-1 2023-01-11T21:21:54.2113029Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/tags/ciflow/trunk/91627 2023-01-11T21:21:54.2113591Z NVIDIA_PRODUCT_NAME=CUDA 2023-01-11T21:21:54.2113843Z CI=true 2023-01-11T21:21:54.2114076Z PYTORCH_OVERRIDE_FLAKY_SIGNAL=1 2023-01-11T21:21:54.2114493Z NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-11-7=11.10.1.25-1 2023-01-11T21:21:54.2114789Z BRANCH= 2023-01-11T21:21:54.2114999Z GITHUB_HEAD_REF= 2023-01-11T21:21:54.2115313Z UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab 2023-01-11T21:21:54.2115775Z GITHUB_ACTOR=pytorch-bot[bot] 2023-01-11T21:21:54.2116101Z CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache 2023-01-11T21:21:54.2116402Z GITHUB_ACTION_REF= 2023-01-11T21:21:54.2116688Z NCCL_VERSION=2.13.4-1 2023-01-11T21:21:54.2116926Z GITHUB_ACTION=__self 2023-01-11T21:21:54.2117173Z VALGRIND=ON 2023-01-11T21:21:54.2117431Z GITHUB_REF_PROTECTED=false 2023-01-11T21:21:54.2117870Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2023-01-11T21:21:54.2118257Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2023-01-11T21:21:54.2118633Z *** 2023-01-11T21:21:54.2118848Z INSTALLED_VISION=yes 2023-01-11T21:21:54.2119100Z NVARCH=x86_64 2023-01-11T21:21:54.2119401Z NV_LIBCUSPARSE_DEV_VERSION=11.7.3.50-1 2023-01-11T21:21:54.2119667Z HOME=/var/lib/jenkins 2023-01-11T21:21:54.2120206Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_08636d76-1c9e-49f4-ae4f-10de4734a716 2023-01-11T21:21:54.2120616Z CARGO_NET_GIT_FETCH_WITH_CLI=true 2023-01-11T21:21:54.2120936Z NVIDIA_CUDA_END_OF_LIFE=1 2023-01-11T21:21:54.2121218Z GITHUB_ACTION_REPOSITORY= 2023-01-11T21:21:54.2121468Z GITHUB_REF_TYPE=tag 2023-01-11T21:21:54.2121778Z NV_LIBNCCL_PACKAGE_VERSION=2.13.4-1 2023-01-11T21:21:54.2122066Z GITHUB_RETENTION_DAYS=90 2023-01-11T21:21:54.2122436Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2023-01-11T21:21:54.2122849Z NV_LIBNCCL_PACKAGE=libnccl2=2.13.4-1+cuda11.7 2023-01-11T21:21:54.2123402Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_08636d76-1c9e-49f4-ae4f-10de4734a716 2023-01-11T21:21:54.2123791Z DEBIAN_FRONTEND=noninteractive 2023-01-11T21:21:54.2124145Z NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev 2023-01-11T21:21:54.2124458Z GITHUB_REF=refs/tags/ciflow/trunk/91627 2023-01-11T21:21:54.2124757Z NV_CUDA_LIB_VERSION=11.7.0-1 2023-01-11T21:21:54.2125079Z GITHUB_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:21:54.2125389Z INSTALLED_PROTOBUF=yes 2023-01-11T21:21:54.2125649Z GITHUB_REPOSITORY_ID=65600975 2023-01-11T21:21:54.2125924Z GITHUB_RUN_ID=3896346758 2023-01-11T21:21:54.2126272Z NV_LIBNPP_PACKAGE=libnpp-11-7=11.7.3.21-1 2023-01-11T21:21:54.2126562Z NV_LIBNCCL_PACKAGE_NAME=libnccl2 2023-01-11T21:21:54.2126868Z LIBRARY_PATH=/usr/local/cuda/lib64/stubs 2023-01-11T21:21:54.2127184Z NV_NVTX_VERSION=11.7.50-1 2023-01-11T21:21:54.2127458Z CONTINUE_THROUGH_ERROR=False 2023-01-11T21:21:54.2127747Z GITHUB_SERVER_URL=https://github.com 2023-01-11T21:21:54.2128026Z MAX_JOBS=30 2023-01-11T21:21:54.2128272Z GITHUB_ACTOR_ID=54816060 2023-01-11T21:21:54.2128557Z NV_LIBCUBLAS_VERSION=11.10.1.25-1 2023-01-11T21:21:54.2128938Z NV_LIBCUBLAS_PACKAGE=libcublas-11-7=11.10.1.25-1 2023-01-11T21:21:54.2129427Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2023-01-11T21:21:54.2129750Z UCX_HOME=/usr 2023-01-11T21:21:54.2130008Z PYTORCH_RETRY_TEST_CASES=1 2023-01-11T21:21:54.2130340Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2023-01-11T21:21:54.2130688Z BASE_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:21:54.2131040Z NV_CUDA_CUDART_DEV_VERSION=11.7.60-1 2023-01-11T21:21:54.2131311Z PR_BODY= 2023-01-11T21:21:54.2131526Z GITHUB_BASE_REF= 2023-01-11T21:21:54.2131761Z TERM=xterm 2023-01-11T21:21:54.2132017Z TORCH_INDUCTOR_INSTALL_GXX=ON 2023-01-11T21:21:54.2132254Z XLA_CUDA= 2023-01-11T21:21:54.2132523Z NV_NVML_DEV_VERSION=11.7.50-1 2023-01-11T21:21:54.2132798Z TORCH_CUDA_ARCH_LIST=Maxwell 2023-01-11T21:21:54.2133445Z CUDA_VERSION=11.7.0 2023-01-11T21:21:54.2133797Z NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-7 2023-01-11T21:21:54.2134221Z OPENSSL_ROOT_DIR=/opt/openssl 2023-01-11T21:21:54.2134756Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_08636d76-1c9e-49f4-ae4f-10de4734a716 2023-01-11T21:21:54.2135147Z GITHUB_JOB=test 2023-01-11T21:21:54.2135407Z SCCACHE_S3_KEY_PREFIX=trunk 2023-01-11T21:21:54.2136087Z COMMIT_MESSAGES=+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation 2023-01-11T21:21:54.2136775Z NVIDIA_DRIVER_CAPABILITIES=compute,utility 2023-01-11T21:21:54.2137051Z NUM_TEST_SHARDS=3 2023-01-11T21:21:54.2137294Z PR_NUMBER= 2023-01-11T21:21:54.2137838Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_08636d76-1c9e-49f4-ae4f-10de4734a716 2023-01-11T21:21:54.2138200Z SHLVL=1 2023-01-11T21:21:54.2138531Z NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-11-7 2023-01-11T21:21:54.2138856Z GITHUB_REPOSITORY=pytorch/pytorch 2023-01-11T21:21:54.2140014Z NVIDIA_REQUIRE_CUDA=cuda>=11.7 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=510,driver<511 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511 2023-01-11T21:21:54.2141116Z NV_LIBNPP_DEV_VERSION=11.7.3.21-1 2023-01-11T21:21:54.2141436Z SHA1=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:21:54.2141724Z GITHUB_EVENT_NAME=push 2023-01-11T21:21:54.2142020Z NV_CUDA_CUDART_VERSION=11.7.60-1 2023-01-11T21:21:54.2142414Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2023-01-11T21:21:54.2142696Z GITHUB_RUN_NUMBER=22986 2023-01-11T21:21:54.2142953Z GITHUB_WORKFLOW=trunk 2023-01-11T21:21:54.2143371Z PATH=/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2023-01-11T21:21:54.2143810Z NV_LIBNCCL_DEV_PACKAGE_VERSION=2.13.4-1 2023-01-11T21:21:54.2144171Z GITHUB_WORKFLOW_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T21:21:54.2144651Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T21:21:54.2145056Z GITHUB_TRIGGERING_ACTOR=pytorch-bot[bot] 2023-01-11T21:21:54.2145344Z _=/usr/bin/env 2023-01-11T21:21:54.2145636Z + echo 'Testing pytorch' 2023-01-11T21:21:54.2145878Z Testing pytorch 2023-01-11T21:21:54.2146156Z + export LANG=C.UTF-8 2023-01-11T21:21:54.2146427Z + LANG=C.UTF-8 2023-01-11T21:21:54.2146652Z + PR_NUMBER= 2023-01-11T21:21:54.2146922Z + [[ distributed == \d\e\f\a\u\l\t ]] 2023-01-11T21:21:54.2147230Z + [[ distributed == \d\i\s\t\r\i\b\u\t\e\d ]] 2023-01-11T21:21:54.2147645Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *rocm* ]] 2023-01-11T21:21:54.2147949Z + [[ distributed == \s\l\o\w ]] 2023-01-11T21:21:54.2148367Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *slow-gradcheck* ]] 2023-01-11T21:21:54.2148811Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *cuda* ]] 2023-01-11T21:21:54.2149153Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2023-01-11T21:21:54.2149481Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2023-01-11T21:21:54.2149780Z + [[ distributed == *crossref* ]] 2023-01-11T21:21:54.2150043Z + [[ distributed == *dynamo* ]] 2023-01-11T21:21:54.2150314Z + [[ distributed == *inductor* ]] 2023-01-11T21:21:54.2150700Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *rocm* ]] 2023-01-11T21:21:54.2151110Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 != *-bazel-* ]] 2023-01-11T21:21:54.2151583Z + pip_install --user ninja==1.10.2 2023-01-11T21:21:54.2151979Z + pip install --progress-bar off --user ninja==1.10.2 2023-01-11T21:21:54.7592284Z Collecting ninja==1.10.2 2023-01-11T21:21:54.7834888Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2023-01-11T21:21:55.6532918Z Installing collected packages: ninja 2023-01-11T21:21:55.6630343Z  WARNING: The script ninja is installed in '/var/lib/jenkins/.local/bin' which is not on PATH. 2023-01-11T21:21:55.6631214Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2023-01-11T21:21:55.6693426Z Successfully installed ninja-1.10.2 2023-01-11T21:21:55.7372365Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2023-01-11T21:21:55.7373441Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2023-01-11T21:21:55.7374526Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *asan* ]] 2023-01-11T21:21:55.7374992Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *-tsan* ]] 2023-01-11T21:21:55.7375358Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2023-01-11T21:21:55.7375681Z + [[ distributed == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2023-01-11T21:21:55.7383295Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *tbb* ]] 2023-01-11T21:21:55.7397775Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *libtorch* ]] 2023-01-11T21:21:55.7398234Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *-bazel-* ]] 2023-01-11T21:21:55.7398669Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *-tsan* ]] 2023-01-11T21:21:55.7401225Z + cd test 2023-01-11T21:21:55.7401850Z + python -c 'import torch; print(torch.__config__.show())' 2023-01-11T21:21:57.3881739Z PyTorch built with: 2023-01-11T21:21:57.3882186Z - GCC 7.5 2023-01-11T21:21:57.3882511Z - C++ Version: 201703 2023-01-11T21:21:57.3883088Z - Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications 2023-01-11T21:21:57.3883647Z - Intel(R) MKL-DNN v2.7.2 (Git Hash fbec3e25a559ee252022ae066817b204e106a6ba) 2023-01-11T21:21:57.3884064Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2023-01-11T21:21:57.3884446Z - LAPACK is enabled (usually provided by MKL) 2023-01-11T21:21:57.3884776Z - NNPACK is enabled 2023-01-11T21:21:57.3885080Z - CPU capability usage: AVX2 2023-01-11T21:21:57.3885404Z - CUDA Runtime 11.7 2023-01-11T21:21:57.3885803Z - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52 2023-01-11T21:21:57.3886128Z - CuDNN 8.5 2023-01-11T21:21:57.3886397Z - Magma 2.6.1 2023-01-11T21:21:57.3889527Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Werror -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 2023-01-11T21:21:57.3892064Z 2023-01-11T21:21:57.6211966Z + cd test 2023-01-11T21:21:57.6212569Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2023-01-11T21:21:59.1969361Z ATen/Parallel: 2023-01-11T21:21:59.1995375Z at::get_num_threads() : 16 2023-01-11T21:21:59.1996041Z at::get_num_interop_threads() : 16 2023-01-11T21:21:59.1996633Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2023-01-11T21:21:59.1997207Z omp_get_max_threads() : 16 2023-01-11T21:21:59.1998924Z Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications 2023-01-11T21:21:59.1999779Z mkl_get_max_threads() : 16 2023-01-11T21:21:59.2000743Z Intel(R) MKL-DNN v2.7.2 (Git Hash fbec3e25a559ee252022ae066817b204e106a6ba) 2023-01-11T21:21:59.2001519Z std::thread::hardware_concurrency() : 32 2023-01-11T21:21:59.2002122Z Environment variables: 2023-01-11T21:21:59.2002695Z OMP_NUM_THREADS : [not set] 2023-01-11T21:21:59.2003256Z MKL_NUM_THREADS : [not set] 2023-01-11T21:21:59.2003846Z ATen parallel backend: OpenMP 2023-01-11T21:21:59.2004223Z 2023-01-11T21:21:59.4263298Z + [[ distributed == *backward* ]] 2023-01-11T21:21:59.4263664Z + [[ distributed == *xla* ]] 2023-01-11T21:21:59.4263968Z + [[ distributed == \j\i\t\_\l\e\g\a\c\y ]] 2023-01-11T21:21:59.4264548Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *libtorch* ]] 2023-01-11T21:21:59.4264898Z + [[ distributed == distributed ]] 2023-01-11T21:21:59.4265175Z + install_filelock 2023-01-11T21:21:59.4265430Z + pip_install filelock 2023-01-11T21:21:59.4265796Z + pip install --progress-bar off filelock 2023-01-11T21:21:59.9439667Z Collecting filelock 2023-01-11T21:21:59.9629325Z Downloading filelock-3.9.0-py3-none-any.whl (9.7 kB) 2023-01-11T21:22:00.8620910Z Installing collected packages: filelock 2023-01-11T21:22:00.8991633Z Successfully installed filelock-3.9.0 2023-01-11T21:22:00.9645582Z + install_triton 2023-01-11T21:22:00.9645962Z + local commit 2023-01-11T21:22:00.9646228Z + [[ distributed == *rocm* ]] 2023-01-11T21:22:00.9649713Z ++ get_pinned_commit triton 2023-01-11T21:22:00.9650026Z ++ cat .github/ci_commit_pins/triton.txt 2023-01-11T21:22:00.9664747Z + commit=0d7e7532279e45672555e344646f5c19c3972331 2023-01-11T21:22:00.9665488Z + pip_install --user git+https://github.com/openai/triton@0d7e7532279e45672555e344646f5c19c3972331#subdirectory=python 2023-01-11T21:22:00.9666212Z + pip install --progress-bar off --user git+https://github.com/openai/triton@0d7e7532279e45672555e344646f5c19c3972331#subdirectory=python 2023-01-11T21:22:01.4403625Z Collecting git+https://github.com/openai/triton@0d7e7532279e45672555e344646f5c19c3972331#subdirectory=python 2023-01-11T21:22:01.4409603Z Cloning https://github.com/openai/triton (to revision 0d7e7532279e45672555e344646f5c19c3972331) to /tmp/pip-req-build-reop_f9m 2023-01-11T21:22:01.4430184Z Running command git clone --filter=blob:none --quiet https://github.com/openai/triton /tmp/pip-req-build-reop_f9m 2023-01-11T21:22:02.5850436Z Running command git rev-parse -q --verify 'sha^0d7e7532279e45672555e344646f5c19c3972331' 2023-01-11T21:22:02.5872370Z Running command git fetch -q https://github.com/openai/triton 0d7e7532279e45672555e344646f5c19c3972331 2023-01-11T21:22:03.0097557Z Running command git checkout -q 0d7e7532279e45672555e344646f5c19c3972331 2023-01-11T21:22:03.4366756Z Resolved https://github.com/openai/triton to commit 0d7e7532279e45672555e344646f5c19c3972331 2023-01-11T21:22:03.4368639Z Running command git submodule update --init --recursive -q 2023-01-11T21:22:04.0453430Z Preparing metadata (setup.py) ... [?25l- done 2023-01-11T21:22:04.2821802Z [?25hCollecting cmake 2023-01-11T21:22:04.3082299Z Downloading cmake-3.25.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23.7 MB) 2023-01-11T21:22:04.6593486Z Requirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from triton==2.0.0) (3.9.0) 2023-01-11T21:22:04.6596940Z Requirement already satisfied: torch in /opt/conda/lib/python3.10/site-packages (from triton==2.0.0) (2.0.0a0+git8419ddd) 2023-01-11T21:22:04.6848789Z Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch->triton==2.0.0) (1.11.1) 2023-01-11T21:22:04.6853207Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch->triton==2.0.0) (4.4.0) 2023-01-11T21:22:04.6858014Z Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch->triton==2.0.0) (2.6.3) 2023-01-11T21:22:04.7067829Z Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch->triton==2.0.0) (1.2.1) 2023-01-11T21:22:04.7138137Z Building wheels for collected packages: triton 2023-01-11T21:22:56.8060829Z Building wheel for triton (setup.py) ... [?25l- \ | / - \ | / - \ | done 2023-01-11T21:22:56.8542034Z [?25h Created wheel for triton: filename=triton-2.0.0-cp310-cp310-linux_x86_64.whl size=15377935 sha256=4ab89babba02fb273fdc00f2beb27428ed486c51f61b7a9f0d3ba5c4e236aeb2 2023-01-11T21:22:56.8545556Z Stored in directory: /var/lib/jenkins/.cache/pip/wheels/3f/1d/23/1c2bc47d618a44f9c949aea4b7e355e737a1f1ed208f009295 2023-01-11T21:22:56.8563249Z Successfully built triton 2023-01-11T21:22:57.7761059Z Installing collected packages: cmake, triton 2023-01-11T21:23:00.9839894Z Successfully installed cmake-3.25.0 triton-2.0.0 2023-01-11T21:23:01.0894061Z + pip_install --user jinja2 2023-01-11T21:23:01.0894516Z + pip install --progress-bar off --user jinja2 2023-01-11T21:23:01.6081712Z Collecting jinja2 2023-01-11T21:23:01.6435991Z Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB) 2023-01-11T21:23:01.6600195Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2) (2.1.1) 2023-01-11T21:23:02.5531032Z Installing collected packages: jinja2 2023-01-11T21:23:02.6564755Z Successfully installed jinja2-3.1.2 2023-01-11T21:23:02.7249700Z + test_distributed 2023-01-11T21:23:02.7250189Z + echo 'Testing distributed python tests' 2023-01-11T21:23:02.7252205Z Testing distributed python tests 2023-01-11T21:23:02.7252755Z + python test/run_test.py --distributed-tests --shard 3 3 --verbose 2023-01-11T21:23:04.8952378Z Ignoring disabled issues: [] 2023-01-11T21:23:04.9342206Z /var/lib/jenkins/workspace/test/run_test.py:1169: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. 2023-01-11T21:23:04.9342787Z if torch.version.cuda is not None and LooseVersion(torch.version.cuda) >= "11.6": 2023-01-11T21:23:04.9349637Z Found test time stats from artifacts 2023-01-11T21:23:04.9368301Z Selected tests: 2023-01-11T21:23:04.9368614Z distributed/algorithms/quantization/test_quantization 2023-01-11T21:23:04.9368960Z distributed/test_distributed_spawn 2023-01-11T21:23:04.9369424Z distributed/rpc/test_faulty_agent 2023-01-11T21:23:04.9373762Z distributed/pipeline/sync/test_stream 2023-01-11T21:23:04.9374121Z distributed/pipeline/sync/test_phony 2023-01-11T21:23:04.9374455Z distributed/pipeline/sync/test_dependency 2023-01-11T21:23:04.9374815Z distributed/pipeline/sync/test_checkpoint 2023-01-11T21:23:04.9375149Z distributed/pipeline/sync/skip/test_verify_skippables 2023-01-11T21:23:04.9375514Z distributed/pipeline/sync/skip/test_portal 2023-01-11T21:23:04.9375862Z distributed/pipeline/sync/skip/test_gpipe 2023-01-11T21:23:04.9376192Z distributed/optim/test_apply_optimizer_in_backward 2023-01-11T21:23:04.9376531Z distributed/elastic/events/lib_test 2023-01-11T21:23:04.9376875Z distributed/_shard/test_replicated_tensor 2023-01-11T21:23:04.9377203Z distributed/_composable/test_checkpoint 2023-01-11T21:23:04.9377481Z distributed/test_nccl 2023-01-11T21:23:04.9377789Z distributed/checkpoint/test_traverse 2023-01-11T21:23:04.9378108Z distributed/nn/jit/test_instantiator 2023-01-11T21:23:04.9378398Z distributed/checkpoint/test_utils 2023-01-11T21:23:04.9378704Z distributed/_tensor/test_pointwise_ops 2023-01-11T21:23:04.9379016Z distributed/test_multi_threaded_pg 2023-01-11T21:23:04.9379553Z distributed/checkpoint/test_fsdp_optim_state 2023-01-11T21:23:04.9379877Z distributed/fsdp/test_fsdp_traversal 2023-01-11T21:23:04.9380186Z distributed/fsdp/test_fsdp_uneven 2023-01-11T21:23:04.9380498Z distributed/checkpoint/test_fsdp_model_state 2023-01-11T21:23:04.9382200Z distributed/_shard/sharded_tensor/ops/test_embedding 2023-01-11T21:23:04.9382587Z distributed/_shard/sharded_tensor/ops/test_chunk 2023-01-11T21:23:04.9382918Z distributed/test_c10d_error_logger 2023-01-11T21:23:04.9383432Z distributed/_shard/sharded_tensor/ops/test_init 2023-01-11T21:23:04.9383766Z distributed/fsdp/test_fsdp_pure_fp16 2023-01-11T21:23:04.9384116Z distributed/_shard/sharded_tensor/ops/test_binary_cmp 2023-01-11T21:23:04.9384472Z distributed/tensor/parallel/test_2d_parallel 2023-01-11T21:23:04.9384803Z distributed/_shard/sharded_tensor/ops/test_tensor_ops 2023-01-11T21:23:04.9385146Z distributed/fsdp/test_fsdp_memory 2023-01-11T21:23:04.9385460Z distributed/test_c10d_object_collectives 2023-01-11T21:23:04.9385790Z distributed/_tensor/test_tp_sharding_ops 2023-01-11T21:23:04.9386101Z distributed/tensor/parallel/test_tp_style 2023-01-11T21:23:04.9386421Z distributed/_tensor/test_redistribute 2023-01-11T21:23:04.9386748Z distributed/fsdp/test_fsdp_ignored_modules 2023-01-11T21:23:04.9387078Z distributed/_shard/sharded_tensor/ops/test_matrix_ops 2023-01-11T21:23:04.9387425Z distributed/fsdp/test_fsdp_flatten_params 2023-01-11T21:23:04.9387745Z distributed/fsdp/test_fsdp_exec_order 2023-01-11T21:23:04.9388065Z distributed/fsdp/test_fsdp_sharded_grad_scaler 2023-01-11T21:23:04.9388411Z distributed/fsdp/test_fsdp_freezing_weights 2023-01-11T21:23:04.9388742Z distributed/_composable/test_fully_shard 2023-01-11T21:23:04.9389019Z distributed/test_store 2023-01-11T21:23:04.9389302Z distributed/fsdp/test_fsdp_misc 2023-01-11T21:23:04.9389608Z distributed/fsdp/test_fsdp_checkpoint 2023-01-11T21:23:04.9389927Z distributed/optim/test_zero_redundancy_optimizer 2023-01-11T21:23:04.9390278Z distributed/fsdp/test_fsdp_summon_full_params 2023-01-11T21:23:04.9390592Z distributed/test_c10d_gloo 2023-01-11T21:23:04.9390888Z distributed/fsdp/test_fsdp_core 2023-01-11T21:23:04.9539947Z Prioritized test from test file changes. 2023-01-11T21:23:04.9540264Z reordering tests for PR: 2023-01-11T21:23:04.9540790Z prioritized: ['distributed/fsdp/test_fsdp_ignored_modules'] 2023-01-11T21:23:04.9545534Z the rest: ['distributed/algorithms/quantization/test_quantization', 'distributed/test_distributed_spawn', 'distributed/rpc/test_faulty_agent', 'distributed/pipeline/sync/test_stream', 'distributed/pipeline/sync/test_phony', 'distributed/pipeline/sync/test_dependency', 'distributed/pipeline/sync/test_checkpoint', 'distributed/pipeline/sync/skip/test_verify_skippables', 'distributed/pipeline/sync/skip/test_portal', 'distributed/pipeline/sync/skip/test_gpipe', 'distributed/optim/test_apply_optimizer_in_backward', 'distributed/elastic/events/lib_test', 'distributed/_shard/test_replicated_tensor', 'distributed/_composable/test_checkpoint', 'distributed/test_nccl', 'distributed/checkpoint/test_traverse', 'distributed/nn/jit/test_instantiator', 'distributed/checkpoint/test_utils', 'distributed/_tensor/test_pointwise_ops', 'distributed/test_multi_threaded_pg', 'distributed/checkpoint/test_fsdp_optim_state', 'distributed/fsdp/test_fsdp_traversal', 'distributed/fsdp/test_fsdp_uneven', 'distributed/checkpoint/test_fsdp_model_state', 'distributed/_shard/sharded_tensor/ops/test_embedding', 'distributed/_shard/sharded_tensor/ops/test_chunk', 'distributed/test_c10d_error_logger', 'distributed/_shard/sharded_tensor/ops/test_init', 'distributed/fsdp/test_fsdp_pure_fp16', 'distributed/_shard/sharded_tensor/ops/test_binary_cmp', 'distributed/tensor/parallel/test_2d_parallel', 'distributed/_shard/sharded_tensor/ops/test_tensor_ops', 'distributed/fsdp/test_fsdp_memory', 'distributed/test_c10d_object_collectives', 'distributed/_tensor/test_tp_sharding_ops', 'distributed/tensor/parallel/test_tp_style', 'distributed/_tensor/test_redistribute', 'distributed/_shard/sharded_tensor/ops/test_matrix_ops', 'distributed/fsdp/test_fsdp_flatten_params', 'distributed/fsdp/test_fsdp_exec_order', 'distributed/fsdp/test_fsdp_sharded_grad_scaler', 'distributed/fsdp/test_fsdp_freezing_weights', 'distributed/_composable/test_fully_shard', 'distributed/test_store', 'distributed/fsdp/test_fsdp_misc', 'distributed/fsdp/test_fsdp_checkpoint', 'distributed/optim/test_zero_redundancy_optimizer', 'distributed/fsdp/test_fsdp_summon_full_params', 'distributed/test_c10d_gloo', 'distributed/fsdp/test_fsdp_core'] 2023-01-11T21:23:04.9548672Z 2023-01-11T21:23:04.9549228Z Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/slow-tests.json to /var/lib/jenkins/workspace/test/.pytorch-slow-tests.json 2023-01-11T21:23:04.9765224Z Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json 2023-01-11T21:23:04.9939274Z parallel (file granularity) tests: 2023-01-11T21:23:04.9939538Z 2023-01-11T21:23:04.9939805Z serial (file granularity) tests: 2023-01-11T21:23:04.9940126Z distributed/fsdp/test_fsdp_ignored_modules 2023-01-11T21:23:04.9940466Z distributed/algorithms/quantization/test_quantization 2023-01-11T21:23:04.9940817Z distributed/test_distributed_spawn 2023-01-11T21:23:04.9941123Z distributed/rpc/test_faulty_agent 2023-01-11T21:23:04.9941415Z distributed/pipeline/sync/test_stream 2023-01-11T21:23:04.9941745Z distributed/pipeline/sync/test_phony 2023-01-11T21:23:04.9942091Z distributed/pipeline/sync/test_dependency 2023-01-11T21:23:04.9942428Z distributed/pipeline/sync/test_checkpoint 2023-01-11T21:23:04.9942760Z distributed/pipeline/sync/skip/test_verify_skippables 2023-01-11T21:23:04.9943113Z distributed/pipeline/sync/skip/test_portal 2023-01-11T21:23:04.9943455Z distributed/pipeline/sync/skip/test_gpipe 2023-01-11T21:23:04.9943787Z distributed/optim/test_apply_optimizer_in_backward 2023-01-11T21:23:04.9944129Z distributed/elastic/events/lib_test 2023-01-11T21:23:04.9944453Z distributed/_shard/test_replicated_tensor 2023-01-11T21:23:04.9944763Z distributed/_composable/test_checkpoint 2023-01-11T21:23:04.9945059Z distributed/test_nccl 2023-01-11T21:23:04.9945350Z distributed/checkpoint/test_traverse 2023-01-11T21:23:04.9945642Z distributed/nn/jit/test_instantiator 2023-01-11T21:23:04.9945947Z distributed/checkpoint/test_utils 2023-01-11T21:23:04.9946258Z distributed/_tensor/test_pointwise_ops 2023-01-11T21:23:04.9946574Z distributed/test_multi_threaded_pg 2023-01-11T21:23:04.9946890Z distributed/checkpoint/test_fsdp_optim_state 2023-01-11T21:23:04.9947216Z distributed/fsdp/test_fsdp_traversal 2023-01-11T21:23:04.9947521Z distributed/fsdp/test_fsdp_uneven 2023-01-11T21:23:04.9947827Z distributed/checkpoint/test_fsdp_model_state 2023-01-11T21:23:04.9948181Z distributed/_shard/sharded_tensor/ops/test_embedding 2023-01-11T21:23:04.9948539Z distributed/_shard/sharded_tensor/ops/test_chunk 2023-01-11T21:23:04.9948848Z distributed/test_c10d_error_logger 2023-01-11T21:23:04.9949177Z distributed/_shard/sharded_tensor/ops/test_init 2023-01-11T21:23:04.9949517Z distributed/fsdp/test_fsdp_pure_fp16 2023-01-11T21:23:04.9949863Z distributed/_shard/sharded_tensor/ops/test_binary_cmp 2023-01-11T21:23:04.9950198Z distributed/tensor/parallel/test_2d_parallel 2023-01-11T21:23:04.9950551Z distributed/_shard/sharded_tensor/ops/test_tensor_ops 2023-01-11T21:23:04.9950979Z distributed/fsdp/test_fsdp_memory 2023-01-11T21:23:04.9951278Z distributed/test_c10d_object_collectives 2023-01-11T21:23:04.9951602Z distributed/_tensor/test_tp_sharding_ops 2023-01-11T21:23:04.9951930Z distributed/tensor/parallel/test_tp_style 2023-01-11T21:23:04.9952231Z distributed/_tensor/test_redistribute 2023-01-11T21:23:04.9952574Z distributed/_shard/sharded_tensor/ops/test_matrix_ops 2023-01-11T21:23:04.9952915Z distributed/fsdp/test_fsdp_flatten_params 2023-01-11T21:23:04.9953218Z distributed/fsdp/test_fsdp_exec_order 2023-01-11T21:23:04.9953706Z distributed/fsdp/test_fsdp_sharded_grad_scaler 2023-01-11T21:23:04.9954056Z distributed/fsdp/test_fsdp_freezing_weights 2023-01-11T21:23:04.9954370Z distributed/_composable/test_fully_shard 2023-01-11T21:23:04.9954667Z distributed/test_store 2023-01-11T21:23:04.9954952Z distributed/fsdp/test_fsdp_misc 2023-01-11T21:23:04.9955241Z distributed/fsdp/test_fsdp_checkpoint 2023-01-11T21:23:04.9955576Z distributed/optim/test_zero_redundancy_optimizer 2023-01-11T21:23:04.9956007Z distributed/fsdp/test_fsdp_summon_full_params 2023-01-11T21:23:04.9956330Z distributed/test_c10d_gloo 2023-01-11T21:23:04.9956605Z distributed/fsdp/test_fsdp_core 2023-01-11T21:23:07.2097243Z Ignoring disabled issues: [] 2023-01-11T21:23:07.2103814Z Ignoring disabled issues: [] 2023-01-11T21:23:07.6258592Z Running distributed/fsdp/test_fsdp_ignored_modules ... [2023-01-11 21:23:07.625232] 2023-01-11T21:23:07.6263216Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_ignored_modules.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:23:07.625984] 2023-01-11T21:23:34.1426402Z 2023-01-11T21:23:34.1429499Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_ignored_modules 2023-01-11T21:23:34.1435937Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_ignored_modules (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_ignored_modules_ed01u8ts) 2023-01-11T21:23:34.1436351Z 2023-01-11T21:23:34.1436472Z Running tests... 2023-01-11T21:23:34.1437283Z ---------------------------------------------------------------------- 2023-01-11T21:23:34.1437979Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_ignored_modules 2023-01-11T21:23:34.1438530Z test_diff_ignored_modules_across_ranks_pass_ignored_modules_to_root_False (__main__.TestFSDPIgnoredModules) 2023-01-11T21:23:34.1439159Z Tests ignoring different modules across ranks. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:23:34.1439824Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 900 2023-01-11T21:23:34.1440636Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 901 2023-01-11T21:23:34.1441351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:23:34.1441837Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:23:34.1442432Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:23:34.1442913Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:23:34.1443483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:23:34.1443934Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:23:34.1444516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:23:34.1444978Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:23:34.1445438Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:23:34.1445943Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:23:34.1446610Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:23:34.1447291Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:23:34.1447819Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:23:34.1448295Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:23:34.1448660Z dist init r=1, world=2 2023-01-11T21:23:34.1448897Z dist init r=0, world=2 2023-01-11T21:23:34.1449383Z ok (5.336s) 2023-01-11T21:23:34.1449811Z test_diff_ignored_modules_across_ranks_pass_ignored_modules_to_root_True (__main__.TestFSDPIgnoredModules) 2023-01-11T21:23:34.1450354Z Tests ignoring different modules across ranks. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 983 2023-01-11T21:23:34.1450868Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 984 2023-01-11T21:23:34.1451600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:23:34.1452054Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:23:34.1452650Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:23:34.1453888Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:23:34.1454755Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:23:34.1455199Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:23:34.1455783Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:23:34.1456317Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:23:34.1456786Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:23:34.1457292Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:23:34.1457936Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:23:34.1458624Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:23:34.1459152Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:23:34.1459632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:23:34.1459974Z dist init r=0, world=2 2023-01-11T21:23:34.1460232Z dist init r=1, world=2 2023-01-11T21:23:34.1460477Z ok (3.813s) 2023-01-11T21:23:34.1460785Z test_ignored_modules_invalid (__main__.TestFSDPIgnoredModules) 2023-01-11T21:23:34.1461310Z Tests that passing an FSDP module as an ignored module or the ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1066 2023-01-11T21:23:34.1461837Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1067 2023-01-11T21:23:34.1462437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:23:34.1462885Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:23:34.1463463Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:23:34.1463943Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:23:34.1464507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:23:34.1464961Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:23:34.1465536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:23:34.1465987Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:23:34.1466445Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:23:34.1466942Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:23:34.1467605Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:23:34.1468426Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:23:34.1468950Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:23:34.1469461Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:23:34.1469822Z dist init r=0, world=2 2023-01-11T21:23:34.1470147Z dist init r=1, world=2 2023-01-11T21:23:34.1470403Z ok (3.311s) 2023-01-11T21:23:34.1470732Z test_ignored_modules_nested (__main__.TestFSDPIgnoredModules) 2023-01-11T21:23:34.1471236Z Tests that passing a module with nested FSDP modules does not ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1145 2023-01-11T21:23:34.1471766Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1146 2023-01-11T21:23:34.1472389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:23:34.1472849Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:23:34.1473411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:23:34.1473884Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:23:34.1474473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:23:34.1474901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:23:34.1475479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:23:34.1475947Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:23:34.1476405Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:23:34.1476888Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:23:34.1477552Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:23:34.1478245Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:23:34.1478752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:23:34.1479229Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:23:34.1479585Z dist init r=0, world=2 2023-01-11T21:23:34.1479843Z dist init r=1, world=2 2023-01-11T21:23:34.1480069Z ok (3.812s) 2023-01-11T21:23:34.1480537Z test_ignored_modules_not_under_wrapped_root (__main__.TestFSDPIgnoredModules) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1228 2023-01-11T21:23:34.1481099Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1229 2023-01-11T21:23:34.1481695Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:23:34.1482147Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:23:34.1482726Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:23:34.1483205Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:23:34.1483771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:23:34.1484221Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:23:34.1484797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:23:34.1485344Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:23:34.1485784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:23:34.1486285Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:23:34.1486951Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:23:34.1487679Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:23:34.1488216Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:23:34.1488690Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:23:34.1489047Z dist init r=1, world=2 2023-01-11T21:23:34.1489281Z dist init r=0, world=2 2023-01-11T21:23:34.1489524Z ok (3.812s) 2023-01-11T21:23:34.1489866Z test_ignored_modules_transformer (__main__.TestFSDPIgnoredModules) 2023-01-11T21:23:34.1490516Z Tests that ignored modules' parameters are not flattened for a ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1311 2023-01-11T21:23:34.1491049Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1312 2023-01-11T21:23:34.1491657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:23:34.1492111Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:23:34.1492670Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:23:34.1493527Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:23:34.1494113Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:23:34.1494549Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:23:34.1495130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:23:34.1495598Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:23:34.1496055Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:23:34.1496540Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:23:34.1497202Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:23:34.1497894Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:23:34.1498419Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:23:34.1498881Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:23:34.1499235Z dist init r=1, world=2 2023-01-11T21:23:34.1499494Z dist init r=0, world=2 2023-01-11T21:23:34.1499721Z ok (4.112s) 2023-01-11T21:23:34.1499870Z 2023-01-11T21:23:34.1500147Z ---------------------------------------------------------------------- 2023-01-11T21:23:34.1500481Z Ran 6 tests in 24.197s 2023-01-11T21:23:34.1500645Z 2023-01-11T21:23:34.1500722Z OK 2023-01-11T21:23:34.1500863Z 2023-01-11T21:23:34.1500990Z Generating XML reports... 2023-01-11T21:23:34.1501632Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_ignored_modules/TEST-TestFSDPIgnoredModules-20230111212309.xml 2023-01-11T21:23:34.1502014Z 2023-01-11T21:23:34.1502490Z ##[endgroup] 2023-01-11T21:23:34.1503122Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_ignored_modules (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_ignored_modules_ed01u8ts) 2023-01-11T21:23:34.1503624Z 2023-01-11T21:23:34.1503952Z Running distributed/algorithms/quantization/test_quantization ... [2023-01-11 21:23:34.142120] 2023-01-11T21:23:34.1504299Z /usr/bin/mpiexec 2023-01-11T21:23:34.1504649Z MPI not available -- MPI backend tests will be skipped 2023-01-11T21:23:34.1505230Z Map different backends to different shards for distributed/algorithms/quantization/test_quantization: {'gloo': 1, 'nccl': 2} 2023-01-11T21:23:34.1505645Z Shard 3: test should be run in 1 2023-01-11T21:23:34.1506037Z Shard 3: nccl should be run in 2 2023-01-11T21:23:34.1506320Z Shard 3: gloo should be run in 1 2023-01-11T21:23:34.1506600Z Shard 3: ucc should be run in 1 2023-01-11T21:23:34.1507043Z Running distributed/test_distributed_spawn ... [2023-01-11 21:23:34.143696] 2023-01-11T21:23:34.1507341Z /usr/bin/mpiexec 2023-01-11T21:23:34.1507707Z MPI not available -- MPI backend tests will be skipped 2023-01-11T21:23:34.1508265Z Map different backends to different shards for distributed/test_distributed_spawn: {'gloo': 1, 'nccl': 2, 'ucc': 3} 2023-01-11T21:23:34.1508648Z Shard 3: test should be run in 1 2023-01-11T21:23:34.1508935Z Shard 3: nccl should be run in 2 2023-01-11T21:23:34.1509217Z Shard 3: gloo should be run in 1 2023-01-11T21:23:34.1509575Z Running distributed tests for the ucc backend with env init_method in shard 3 of 3 2023-01-11T21:23:34.1510332Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:23:34.145816] 2023-01-11T21:48:12.7117103Z 2023-01-11T21:48:12.7117626Z Expand the folded group to see the log file of distributed/test_distributed_spawn 2023-01-11T21:48:12.7121304Z ##[group]PRINTING LOG FILE of distributed/test_distributed_spawn (/var/lib/jenkins/workspace/test/test-reports/distributed-test_distributed_spawn_lx5pfkm6) 2023-01-11T21:48:12.7126489Z 2023-01-11T21:48:12.7180498Z , <__main__.TestDistBackendWithSpawn testMethod=test_3_level_hierarchical_model_averager>, <__main__.TestDistBackendWithSpawn testMethod=test_Backend_enum_class>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallelCPU>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallelCPU_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_2D_Input>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Channels_Last>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_No_Affine>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_non_default_stream>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_requires_grad>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_with_amp_and_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedSampler_padding>, <__main__.TestDistBackendWithSpawn testMethod=test_SyncBatchNorm_process_group>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync_allreduce_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync_allreduce_with_then_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_simple>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_with_empty>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_into_cat_tensor_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_into_stack_tensor_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_multigpu_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_object_default_pg>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_object_subgroup>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_v_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_max_complex_unsupported>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_complex_unsupported_ops>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_multigpu_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_result_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_async>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_cuda_async>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_average_parameters>, <__main__.TestDistBackendWithSpawn testMethod=test_backend_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_backend_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_timeout_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_timeout_global>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_timeout_group>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_gloo>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_gloo_tags>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_mixed_backend_err>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_no_rank_zero_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_op_err>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_op_list_err>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_ring_exchange_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_self_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_tensor_err>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_group>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_object_list>, <__main__.TestDistBackendWithSpawn testMethod=test_compute_bucket_assignment_by_size_sparse_error_with_logger>, <__main__.TestDistBackendWithSpawn testMethod=test_compute_bucket_assignment_by_size_sparse_error_without_logger>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_apply_optim_in_backward>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_apply_optim_in_backward_grad_as_bucket_view_false>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_apply_optim_in_backward_ignored_params>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_broadcast_buffer>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_broadcast_buffer_via_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_buffer_hook_allreduce>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_buffer_hook_allreduce_return_future>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_build_debug_param_to_name_mapping>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_build_debug_param_to_name_mapping_requires_grad>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_comm_hook_logging>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_control_flow_different_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_control_flow_same_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_create_graph>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_device>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_forward_backward_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_grad_div_uneven_inputs>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_allreduce>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_allreduce_process_group>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_post_localSGD>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_powerSGD>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_pickling_powerSGD>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adam_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adam_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_ignore_params_arg>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_inference>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_join_model_equivalence>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_logging_data_cpu>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_logging_data_gpu>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_model_diff_num_params_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_model_diff_shape_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_multiple_nested_unused_params_err_ignore_params>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_multiple_nested_unused_params_error>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_namedtuple>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_new_tensor_in_fwd>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_new_tensor_in_fwd_static_graph>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_profiling_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_profiling_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_python_error_logged>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_returns_tensor_with_no_grad>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_shared_grad_acc_unused_params>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_static_graph_nested_types>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_sync_bn_training_vs_eval>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_sync_module_states>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_input_exception>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_input_join_disable>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_inputs>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_inputs_stop_iteration_sync_bn>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_unused_params_rebuild_buckets_exception>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_zero_output_features>, <__main__.TestDistBackendWithSpawn testMethod=test_destroy_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_destroy_group>, <__main__.TestDistBackendWithSpawn testMethod=test_detect_ddp_is_actually_static>, <__main__.TestDistBackendWithSpawn testMethod=test_different_graph_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_dump_DDP_relevant_env_vars>, <__main__.TestDistBackendWithSpawn testMethod=test_gather>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_checks>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_group>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_object>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_object_subgroup>, <__main__.TestDistBackendWithSpawn testMethod=test_get_backend>, <__main__.TestDistBackendWithSpawn testMethod=test_get_future>, <__main__.TestDistBackendWithSpawn testMethod=test_get_rank>, <__main__.TestDistBackendWithSpawn testMethod=test_get_rank_size_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_get_rank_size_group>, <__main__.TestDistBackendWithSpawn testMethod=test_invalid_static_graph>, <__main__.TestDistBackendWithSpawn testMethod=test_irecv>, <__main__.TestDistBackendWithSpawn testMethod=test_isend>, <__main__.TestDistBackendWithSpawn testMethod=test_isend_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_isend_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_allreduce_hang>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_allreduce_hang_wait_all_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_failure_order>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_gloo>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_gloo_rank_0_timeout>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_gloo_subgroup>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_wait_all_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_allgather>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_allreduce>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_broadcast>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_reduce>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_high_priority_stream>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_by_enumeration>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_by_enumeration_input_rank_exceeds_world_size>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_by_enumeration_negative_input_rank>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_group_size_exceeds_world_size>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_overlap_not_allowed>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_world_size_not_divisible_by_group_size>, <__main__.TestDistBackendWithSpawn testMethod=test_output_unused_in_loss_dict_module>, <__main__.TestDistBackendWithSpawn testMethod=test_output_unused_in_loss_tuple_module>, <__main__.TestDistBackendWithSpawn testMethod=test_periodic_model_averager>, <__main__.TestDistBackendWithSpawn testMethod=test_periodic_model_averager_param_group>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity_with_hierarchical_sgd>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_step_reload>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_max>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_min>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_product>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_scatter_tensor_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_scatter_v_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum_cuda_twice>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum_twice>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_checks>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_group>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_object_list>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_any_source>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_any_source_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_any_source_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_nccl_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_nccl_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_with_tag>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_with_tag_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_with_tag_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_sparse_all_reduce_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_sparse_all_reduce_sum_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_stateless_api_with_ddp>, <__main__.TestDistBackendWithSpawn testMethod=test_static_graph_api_cpu>, <__main__.TestDistBackendWithSpawn testMethod=test_sync_bn_logged>, <__main__.TestDistBackendWithSpawn testMethod=test_undefined_grad_parity_unused_parameters>, <__main__.TestDistBackendWithSpawn testMethod=test_verify_model_across_rank_with_logger>, <__main__.TestDistBackendWithSpawn testMethod=test_verify_model_across_rank_without_logger>]> 2023-01-11T21:48:12.7231278Z test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7233654Z test_3_level_hierarchical_model_averager (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7234454Z test_Backend_enum_class (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7235207Z test_DistributedDataParallel (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7235939Z test_DistributedDataParallelCPU (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7236807Z test_DistributedDataParallelCPU_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7237699Z test_DistributedDataParallel_SyncBatchNorm (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7238552Z test_DistributedDataParallel_SyncBatchNorm_2D_Input (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7239490Z test_DistributedDataParallel_SyncBatchNorm_Channels_Last (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7240314Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7241170Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7242143Z test_DistributedDataParallel_SyncBatchNorm_No_Affine (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7243031Z test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7243979Z test_DistributedDataParallel_non_default_stream (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7244538Z test_DistributedDataParallel_requires_grad (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7245026Z test_DistributedDataParallel_with_amp_and_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7245776Z test_DistributedSampler_padding (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7246432Z test_SyncBatchNorm_process_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7247183Z test_accumulate_gradients_no_sync (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7247998Z test_accumulate_gradients_no_sync_allreduce_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7248834Z test_accumulate_gradients_no_sync_allreduce_with_then_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7249622Z test_accumulate_gradients_no_sync_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7250202Z test_all_gather (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7250888Z test_all_gather_coalesced_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7251615Z test_all_gather_coalesced_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7252409Z test_all_gather_coalesced_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7253352Z test_all_gather_coalesced_simple (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7254197Z test_all_gather_coalesced_with_empty (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7254628Z test_all_gather_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7255240Z test_all_gather_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7255965Z test_all_gather_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7256664Z test_all_gather_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7257508Z test_all_gather_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7258234Z test_all_gather_into_cat_tensor_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7258972Z test_all_gather_into_stack_tensor_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7259737Z test_all_gather_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7260306Z test_all_gather_multigpu_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7260984Z test_all_gather_object_default_pg (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7261749Z test_all_gather_object_subgroup (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7262298Z test_all_gather_v_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7263022Z test_all_reduce_coalesced_full_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7263827Z test_all_reduce_coalesced_full_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7264654Z test_all_reduce_coalesced_full_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7265481Z test_all_reduce_coalesced_full_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7266029Z test_all_reduce_coalesced_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7266801Z test_all_reduce_coalesced_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7267276Z test_all_reduce_coalesced_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7267926Z test_all_reduce_coalesced_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7268689Z test_all_reduce_coalesced_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7269504Z test_all_reduce_coalesced_max_complex_unsupported (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7270208Z test_all_reduce_coalesced_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7270718Z test_all_reduce_coalesced_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7271476Z test_all_reduce_coalesced_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7272256Z test_all_reduce_complex_unsupported_ops (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7273012Z test_all_reduce_full_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7273784Z test_all_reduce_full_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7274546Z test_all_reduce_full_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7275068Z test_all_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7275830Z test_all_reduce_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7276542Z test_all_reduce_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7277251Z test_all_reduce_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7277994Z test_all_reduce_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7278683Z test_all_reduce_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7279213Z test_all_reduce_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7279588Z test_all_reduce_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7280007Z test_all_reduce_multigpu_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7280421Z test_all_reduce_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7280801Z test_all_reduce_result_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7281195Z test_all_reduce_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7281661Z test_all_reduce_sum_async (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7282065Z test_all_reduce_sum_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7282449Z test_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7282857Z test_all_reduce_sum_cuda_async (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7283277Z test_all_reduce_sum_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7283695Z test_all_to_all (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7284091Z test_all_to_all_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7284472Z test_all_to_all_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7284852Z test_all_to_all_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7285255Z test_all_to_all_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7285663Z test_all_to_all_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7286071Z test_all_to_all_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7286447Z test_all_to_all_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7286856Z test_all_to_all_single_equal_split (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7287293Z test_all_to_all_single_equal_split_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7287724Z test_all_to_all_single_equal_split_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7288175Z test_all_to_all_single_equal_split_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7288642Z test_all_to_all_single_equal_split_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7289102Z test_all_to_all_single_equal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7289537Z test_all_to_all_single_equal_split_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7289983Z test_all_to_all_single_equal_split_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7290431Z test_all_to_all_single_unequal_split (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7290860Z test_all_to_all_single_unequal_split_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7291306Z test_all_to_all_single_unequal_split_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7291766Z test_all_to_all_single_unequal_split_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7292236Z test_all_to_all_single_unequal_split_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7292691Z test_all_to_all_single_unequal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7293994Z test_all_to_all_single_unequal_split_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7294451Z test_all_to_all_single_unequal_split_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7294871Z test_average_parameters (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7295274Z test_backend_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7295667Z test_backend_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7296041Z test_barrier (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7296396Z test_barrier_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7296788Z test_barrier_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7297193Z test_barrier_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7297574Z test_barrier_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7297966Z test_barrier_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7298373Z test_barrier_timeout_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7298762Z test_barrier_timeout_global (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7299175Z test_barrier_timeout_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7299583Z test_batch_isend_irecv_gloo (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7300101Z test_batch_isend_irecv_gloo_tags (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7300512Z test_batch_isend_irecv_mixed_backend_err (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7300931Z test_batch_isend_irecv_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7301352Z test_batch_isend_irecv_no_rank_zero_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7301759Z test_batch_isend_irecv_op_err (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7302422Z test_batch_isend_irecv_op_list_err (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7302873Z test_batch_isend_irecv_ring_exchange_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7303280Z test_batch_isend_irecv_self_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7303704Z test_batch_isend_irecv_tensor_err (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7304095Z test_broadcast (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7304479Z test_broadcast_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7304864Z test_broadcast_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7305256Z test_broadcast_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7305656Z test_broadcast_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7306040Z test_broadcast_object_list (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7306509Z test_compute_bucket_assignment_by_size_sparse_error_with_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7307029Z test_compute_bucket_assignment_by_size_sparse_error_without_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7307507Z test_ddp_apply_optim_in_backward (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7307952Z test_ddp_apply_optim_in_backward_grad_as_bucket_view_false (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7308435Z test_ddp_apply_optim_in_backward_ignored_params (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7308883Z test_ddp_broadcast_buffer (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7309452Z test_ddp_broadcast_buffer_via_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7309877Z test_ddp_buffer_hook_allreduce (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7310322Z test_ddp_buffer_hook_allreduce_return_future (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7310783Z test_ddp_build_debug_param_to_name_mapping (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7311236Z test_ddp_build_debug_param_to_name_mapping_requires_grad (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7311680Z test_ddp_comm_hook_logging (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7312108Z test_ddp_control_flow_different_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7312533Z test_ddp_control_flow_same_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7312947Z test_ddp_create_graph (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7313332Z test_ddp_device (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7313728Z test_ddp_forward_backward_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7314128Z test_ddp_grad_div_uneven_inputs (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7314547Z test_ddp_hook_parity_allreduce (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7314997Z test_ddp_hook_parity_allreduce_process_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7315422Z test_ddp_hook_parity_post_localSGD (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7315846Z test_ddp_hook_parity_powerSGD (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7316271Z test_ddp_hook_pickling_powerSGD (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7316722Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7317226Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7317874Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7318498Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7319084Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7319743Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7320357Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7320970Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7321572Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7322147Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7322691Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7323186Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7323642Z test_ddp_ignore_params_arg (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7324013Z test_ddp_inference (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7324414Z test_ddp_join_model_equivalence (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7324830Z test_ddp_logging_data_cpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7325211Z test_ddp_logging_data_gpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7325644Z test_ddp_model_diff_num_params_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7326091Z test_ddp_model_diff_shape_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7326538Z test_ddp_multiple_nested_unused_params_err_ignore_params (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7327011Z test_ddp_multiple_nested_unused_params_error (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7327433Z test_ddp_namedtuple (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7327829Z test_ddp_new_tensor_in_fwd (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7328228Z test_ddp_new_tensor_in_fwd_static_graph (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7328668Z test_ddp_profiling_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7329105Z test_ddp_profiling_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7329505Z test_ddp_python_error_logged (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7329929Z test_ddp_returns_tensor_with_no_grad (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7330366Z test_ddp_shared_grad_acc_unused_params (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7330800Z test_ddp_static_graph_nested_types (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7331207Z test_ddp_sync_bn_training_vs_eval (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7331620Z test_ddp_sync_module_states (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7332035Z test_ddp_uneven_input_exception (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7332443Z test_ddp_uneven_input_join_disable (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7333353Z test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7333959Z test_ddp_uneven_inputs_stop_iteration_sync_bn (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7334408Z test_ddp_unused_params_rebuild_buckets_exception (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7334947Z test_ddp_zero_output_features (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7335351Z test_destroy_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7335737Z test_destroy_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7336121Z test_detect_ddp_is_actually_static (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7336544Z test_different_graph_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7337040Z test_dump_DDP_relevant_env_vars (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7337422Z test_gather (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7337793Z test_gather_checks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7338169Z test_gather_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7338534Z test_gather_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7338915Z test_gather_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7339292Z test_gather_object (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7339686Z test_gather_object_subgroup (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7340055Z test_get_backend (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7340419Z test_get_future (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7340784Z test_get_rank (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7341150Z test_get_rank_size_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7341555Z test_get_rank_size_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7341953Z test_invalid_static_graph (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7342308Z test_irecv (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7342659Z test_isend (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7343044Z test_isend_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7343450Z test_isend_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7343851Z test_monitored_barrier_allreduce_hang (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7344314Z test_monitored_barrier_allreduce_hang_wait_all_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7344768Z test_monitored_barrier_failure_order (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7345165Z test_monitored_barrier_gloo (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7345588Z test_monitored_barrier_gloo_rank_0_timeout (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7346030Z test_monitored_barrier_gloo_subgroup (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7346440Z test_monitored_barrier_wait_all_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7346865Z test_nccl_backend_bool_allgather (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7347286Z test_nccl_backend_bool_allreduce (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7347737Z test_nccl_backend_bool_broadcast (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7348138Z test_nccl_backend_bool_reduce (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7348554Z test_nccl_high_priority_stream (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7348954Z test_new_subgroups (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7349340Z test_new_subgroups_by_enumeration (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7349812Z test_new_subgroups_by_enumeration_input_rank_exceeds_world_size (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7350308Z test_new_subgroups_by_enumeration_negative_input_rank (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7350782Z test_new_subgroups_group_size_exceeds_world_size (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7351218Z test_new_subgroups_overlap_not_allowed (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7351680Z test_new_subgroups_world_size_not_divisible_by_group_size (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7352142Z test_output_unused_in_loss_dict_module (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7352620Z test_output_unused_in_loss_tuple_module (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7353044Z test_periodic_model_averager (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7353478Z test_periodic_model_averager_param_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7353925Z test_post_localSGD_optimizer_parity (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7354413Z test_post_localSGD_optimizer_parity_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7354907Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7355420Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7355886Z test_post_localSGD_optimizer_step_reload (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7356310Z test_reduce_full_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7356710Z test_reduce_full_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7357126Z test_reduce_full_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7357518Z test_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7357910Z test_reduce_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7358295Z test_reduce_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7358670Z test_reduce_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7359066Z test_reduce_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7359441Z test_reduce_max (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7359787Z test_reduce_min (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7360170Z test_reduce_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7360557Z test_reduce_product (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7360937Z test_reduce_scatter_tensor_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7361352Z test_reduce_scatter_v_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7361732Z test_reduce_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7362110Z test_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7362482Z test_reduce_sum_cuda_twice (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7362868Z test_reduce_sum_twice (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7363236Z test_scatter (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7363590Z test_scatter_checks (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7363974Z test_scatter_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7364350Z test_scatter_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7364717Z test_scatter_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7365118Z test_scatter_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7365501Z test_scatter_group (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7365895Z test_scatter_object_list (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7366252Z test_send_recv (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7366631Z test_send_recv_any_source (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7367056Z test_send_recv_any_source_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7367484Z test_send_recv_any_source_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7367919Z test_send_recv_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7368317Z test_send_recv_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7368706Z test_send_recv_nccl_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7369253Z test_send_recv_nccl_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7369670Z test_send_recv_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7370070Z test_send_recv_with_tag (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7370526Z test_send_recv_with_tag_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7370964Z test_send_recv_with_tag_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7371383Z test_sparse_all_reduce_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7371768Z test_sparse_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7372223Z test_stateless_api_with_ddp (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7372633Z test_static_graph_api_cpu (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7373905Z test_sync_bn_logged (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7374420Z test_undefined_grad_parity_unused_parameters (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7374878Z test_verify_model_across_rank_with_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7375326Z test_verify_model_across_rank_without_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7376045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7376505Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7377088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7377563Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7377784Z 2023-01-11T21:48:12.7377897Z Running tests... 2023-01-11T21:48:12.7378312Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7378840Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7379417Z test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7379981Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1464 2023-01-11T21:48:12.7380436Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1465 2023-01-11T21:48:12.7381048Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7381482Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7382062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7382537Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7383101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7383550Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7384124Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7384598Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7385036Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7385538Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7386206Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7386900Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7387407Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7387881Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7388412Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:48:12.7389362Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:48:12.7390026Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:48:12.7390915Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:48:12.7391579Z [1673472223.450211] [7e0e28e30a97:1464 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7392117Z [1673472223.454753] [7e0e28e30a97:1465 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7392606Z [1673472223.456141] [7e0e28e30a97:1464 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7393091Z [1673472223.456141] [7e0e28e30a97:1464 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7393565Z [1673472223.459626] [7e0e28e30a97:1465 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7394034Z [1673472223.459626] [7e0e28e30a97:1465 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7394536Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:48:12.7395376Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:48:12.7396049Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:48:12.7396881Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:48:12.7397525Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:48:12.7398351Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:48:12.7399012Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:48:12.7399834Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:48:12.7400474Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:48:12.7401295Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:48:12.7401950Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T21:48:12.7402774Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T21:48:12.7403242Z ok (6.001s) 2023-01-11T21:48:12.7403396Z 2023-01-11T21:48:12.7403671Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7404005Z Ran 1 test in 6.001s 2023-01-11T21:48:12.7404170Z 2023-01-11T21:48:12.7404268Z OK 2023-01-11T21:48:12.7404384Z 2023-01-11T21:48:12.7404512Z Generating XML reports... 2023-01-11T21:48:12.7405121Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212338.xml 2023-01-11T21:48:12.7405909Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7406346Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7406924Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7407450Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7407690Z 2023-01-11T21:48:12.7407802Z Running tests... 2023-01-11T21:48:12.7408194Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7408728Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7409259Z test_3_level_hierarchical_model_averager (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.003s) 2023-01-11T21:48:12.7409573Z 2023-01-11T21:48:12.7409819Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7410151Z Ran 1 test in 0.003s 2023-01-11T21:48:12.7410315Z 2023-01-11T21:48:12.7410427Z OK (skipped=1) 2023-01-11T21:48:12.7410584Z 2023-01-11T21:48:12.7410711Z Generating XML reports... 2023-01-11T21:48:12.7411297Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212346.xml 2023-01-11T21:48:12.7412017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7412476Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7413340Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7413826Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7414058Z 2023-01-11T21:48:12.7414178Z Running tests... 2023-01-11T21:48:12.7414593Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7415108Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7415618Z test_Backend_enum_class (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7416108Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1612 2023-01-11T21:48:12.7416560Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1613 2023-01-11T21:48:12.7417154Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7417605Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7418182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7418636Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7419229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7419676Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7420256Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7420709Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7421166Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7421665Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7422308Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7423002Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7423628Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7424106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7424437Z ok (4.256s) 2023-01-11T21:48:12.7424588Z 2023-01-11T21:48:12.7424867Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7425285Z Ran 1 test in 4.256s 2023-01-11T21:48:12.7425464Z 2023-01-11T21:48:12.7425541Z OK 2023-01-11T21:48:12.7425676Z 2023-01-11T21:48:12.7425805Z Generating XML reports... 2023-01-11T21:48:12.7426422Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212349.xml 2023-01-11T21:48:12.7427176Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7427620Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7428199Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7428673Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7428906Z 2023-01-11T21:48:12.7429018Z Running tests... 2023-01-11T21:48:12.7429405Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7429945Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7430480Z test_DistributedDataParallel (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7431518Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77317 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.619s) 2023-01-11T21:48:12.7432058Z 2023-01-11T21:48:12.7432311Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7432642Z Ran 1 test in 1.619s 2023-01-11T21:48:12.7432806Z 2023-01-11T21:48:12.7432916Z OK (skipped=1) 2023-01-11T21:48:12.7433074Z 2023-01-11T21:48:12.7433201Z Generating XML reports... 2023-01-11T21:48:12.7433794Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212356.xml 2023-01-11T21:48:12.7434509Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7434963Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7435520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7435991Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7436226Z 2023-01-11T21:48:12.7436336Z Running tests... 2023-01-11T21:48:12.7436746Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7437258Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7437801Z test_DistributedDataParallelCPU (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7438321Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1749 2023-01-11T21:48:12.7438751Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1750 2023-01-11T21:48:12.7439368Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7439822Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7440399Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7440925Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7441515Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7441964Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7442589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7443046Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7443505Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7444007Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7444648Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7445348Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7445874Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7446352Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7446819Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7447308Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7447840Z [1673472244.097056] [7e0e28e30a97:1749 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7448377Z [1673472244.898597] [7e0e28e30a97:1749 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7448838Z [1673472244.898597] [7e0e28e30a97:1749 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7449359Z [1673472244.099568] [7e0e28e30a97:1750 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7449858Z [1673472244.882986] [7e0e28e30a97:1750 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7450338Z [1673472244.882986] [7e0e28e30a97:1750 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7450671Z ok (5.554s) 2023-01-11T21:48:12.7450822Z 2023-01-11T21:48:12.7451100Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7451430Z Ran 1 test in 5.554s 2023-01-11T21:48:12.7451594Z 2023-01-11T21:48:12.7451670Z OK 2023-01-11T21:48:12.7451806Z 2023-01-11T21:48:12.7451932Z Generating XML reports... 2023-01-11T21:48:12.7452547Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212400.xml 2023-01-11T21:48:12.7453631Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7454066Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7454649Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7455124Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7455358Z 2023-01-11T21:48:12.7455449Z Running tests... 2023-01-11T21:48:12.7455857Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7456389Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7456946Z test_DistributedDataParallelCPU_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7457553Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1863 2023-01-11T21:48:12.7457999Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1864 2023-01-11T21:48:12.7458614Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7459066Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7459696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7460181Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7460766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7461193Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7461770Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7462240Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7462700Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7463179Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7463844Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7464535Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7465061Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7465515Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7465992Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7466485Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7466993Z [1673472252.131176] [7e0e28e30a97:1863 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7467510Z [1673472252.914575] [7e0e28e30a97:1863 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7467989Z [1673472252.914575] [7e0e28e30a97:1863 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7468504Z [1673472252.152285] [7e0e28e30a97:1864 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7468988Z [1673472252.926812] [7e0e28e30a97:1864 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7469458Z [1673472252.926812] [7e0e28e30a97:1864 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7469803Z ok (5.429s) 2023-01-11T21:48:12.7469953Z 2023-01-11T21:48:12.7470235Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7470551Z Ran 1 test in 5.429s 2023-01-11T21:48:12.7470718Z 2023-01-11T21:48:12.7470816Z OK 2023-01-11T21:48:12.7470954Z 2023-01-11T21:48:12.7471083Z Generating XML reports... 2023-01-11T21:48:12.7471674Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212408.xml 2023-01-11T21:48:12.7472392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7472846Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7473428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7473956Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7474186Z 2023-01-11T21:48:12.7474298Z Running tests... 2023-01-11T21:48:12.7474718Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7475229Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7475834Z test_DistributedDataParallel_SyncBatchNorm (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7476373Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1977 2023-01-11T21:48:12.7476824Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1978 2023-01-11T21:48:12.7477421Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7477879Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7478464Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7478939Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7479501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7479954Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7480533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7480983Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7481437Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7481934Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7482597Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7483272Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7483799Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7484279Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7484806Z [1673472260.844826] [7e0e28e30a97:1978 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7485297Z [1673472260.850351] [7e0e28e30a97:1978 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7485774Z [1673472260.850351] [7e0e28e30a97:1978 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7486290Z [1673472260.836372] [7e0e28e30a97:1977 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7486792Z [1673472260.841998] [7e0e28e30a97:1977 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7487252Z [1673472260.841998] [7e0e28e30a97:1977 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7487733Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7488224Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7488709Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7489174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7489720Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7490194Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7490650Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7491165Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7491642Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7492114Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7492571Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7493426Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7493775Z ok (6.326s) 2023-01-11T21:48:12.7493933Z 2023-01-11T21:48:12.7494204Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7494537Z Ran 1 test in 6.326s 2023-01-11T21:48:12.7494702Z 2023-01-11T21:48:12.7494800Z OK 2023-01-11T21:48:12.7494935Z 2023-01-11T21:48:12.7495064Z Generating XML reports... 2023-01-11T21:48:12.7495653Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212416.xml 2023-01-11T21:48:12.7496375Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7496833Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7497395Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7497871Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7498104Z 2023-01-11T21:48:12.7498222Z Running tests... 2023-01-11T21:48:12.7498631Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7499144Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7499708Z test_DistributedDataParallel_SyncBatchNorm_2D_Input (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7500249Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2095 2023-01-11T21:48:12.7500686Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2096 2023-01-11T21:48:12.7501297Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7501755Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7502334Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7502792Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7503377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7503825Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7504401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7504854Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7505311Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7505810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7506452Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7507146Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7507778Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7508250Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7508816Z [1673472269.735630] [7e0e28e30a97:2096 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7509336Z [1673472269.741478] [7e0e28e30a97:2096 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7509814Z [1673472269.741478] [7e0e28e30a97:2096 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7510329Z [1673472269.735378] [7e0e28e30a97:2095 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7510818Z [1673472269.741441] [7e0e28e30a97:2095 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7511289Z [1673472269.741441] [7e0e28e30a97:2095 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7511768Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7512260Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7512722Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7513207Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7513558Z ok (5.534s) 2023-01-11T21:48:12.7513709Z 2023-01-11T21:48:12.7513977Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7514307Z Ran 1 test in 5.534s 2023-01-11T21:48:12.7514475Z 2023-01-11T21:48:12.7514572Z OK 2023-01-11T21:48:12.7514707Z 2023-01-11T21:48:12.7514836Z Generating XML reports... 2023-01-11T21:48:12.7515424Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212425.xml 2023-01-11T21:48:12.7516139Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7516601Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7517162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7517636Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7517868Z 2023-01-11T21:48:12.7517980Z Running tests... 2023-01-11T21:48:12.7518386Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7518898Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7519476Z test_DistributedDataParallel_SyncBatchNorm_Channels_Last (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7520022Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2213 2023-01-11T21:48:12.7520473Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2214 2023-01-11T21:48:12.7521065Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7521519Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7522098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7522550Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7523130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7523658Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7524242Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7524692Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7525196Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7525696Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7526341Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7527033Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7527559Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7528041Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7528540Z [1673472277.779515] [7e0e28e30a97:2213 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7529056Z [1673472277.785808] [7e0e28e30a97:2213 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7529538Z [1673472277.785808] [7e0e28e30a97:2213 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7530051Z [1673472277.780972] [7e0e28e30a97:2214 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7530533Z [1673472277.786563] [7e0e28e30a97:2214 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7531012Z [1673472277.786563] [7e0e28e30a97:2214 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7531488Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7531974Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7532443Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7533343Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7533830Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7534303Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7534760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7535241Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7535588Z ok (5.626s) 2023-01-11T21:48:12.7535739Z 2023-01-11T21:48:12.7536002Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7536332Z Ran 1 test in 5.626s 2023-01-11T21:48:12.7536495Z 2023-01-11T21:48:12.7536591Z OK 2023-01-11T21:48:12.7536727Z 2023-01-11T21:48:12.7536853Z Generating XML reports... 2023-01-11T21:48:12.7537445Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212433.xml 2023-01-11T21:48:12.7538157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7538612Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7539169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7539733Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7539965Z 2023-01-11T21:48:12.7540077Z Running tests... 2023-01-11T21:48:12.7540488Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7540999Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7541654Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7542229Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2331 2023-01-11T21:48:12.7542662Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2332 2023-01-11T21:48:12.7543278Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7543733Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7544318Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7544771Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7545356Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7545802Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7546384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7546834Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7547292Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7547790Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7548455Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7549156Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7549682Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7550159Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7550663Z [1673472285.978473] [7e0e28e30a97:2331 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7551178Z [1673472285.984868] [7e0e28e30a97:2331 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7551662Z [1673472285.984868] [7e0e28e30a97:2331 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7552185Z [1673472285.981405] [7e0e28e30a97:2332 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7552669Z [1673472285.986308] [7e0e28e30a97:2332 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7553138Z [1673472285.986308] [7e0e28e30a97:2332 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7553622Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7554109Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7554446Z ok (5.633s) 2023-01-11T21:48:12.7554596Z 2023-01-11T21:48:12.7554874Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7555207Z Ran 1 test in 5.633s 2023-01-11T21:48:12.7555373Z 2023-01-11T21:48:12.7555522Z OK 2023-01-11T21:48:12.7555660Z 2023-01-11T21:48:12.7555790Z Generating XML reports... 2023-01-11T21:48:12.7556407Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212441.xml 2023-01-11T21:48:12.7557123Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7557561Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7558197Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7558678Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7558911Z 2023-01-11T21:48:12.7559003Z Running tests... 2023-01-11T21:48:12.7559415Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7559947Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7560540Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7561079Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2449 2023-01-11T21:48:12.7561533Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2450 2023-01-11T21:48:12.7562148Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7562603Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7563162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7563633Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7564215Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7564647Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7565224Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7565690Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7566147Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7566627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7567287Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7567981Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7568508Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7568970Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7569489Z [1673472294.188506] [7e0e28e30a97:2450 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7570000Z [1673472294.195139] [7e0e28e30a97:2450 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7570465Z [1673472294.195139] [7e0e28e30a97:2450 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7570984Z [1673472294.182078] [7e0e28e30a97:2449 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7571490Z [1673472294.188127] [7e0e28e30a97:2449 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7571962Z [1673472294.188127] [7e0e28e30a97:2449 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7572487Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7573578Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7574067Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7574620Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7574964Z ok (6.147s) 2023-01-11T21:48:12.7575114Z 2023-01-11T21:48:12.7575399Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7575731Z Ran 1 test in 6.148s 2023-01-11T21:48:12.7575895Z 2023-01-11T21:48:12.7575972Z OK 2023-01-11T21:48:12.7576107Z 2023-01-11T21:48:12.7576237Z Generating XML reports... 2023-01-11T21:48:12.7576846Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212449.xml 2023-01-11T21:48:12.7577565Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7577999Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7578577Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7579052Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7579287Z 2023-01-11T21:48:12.7579399Z Running tests... 2023-01-11T21:48:12.7579785Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7580315Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7580879Z test_DistributedDataParallel_SyncBatchNorm_No_Affine (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7581403Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2567 2023-01-11T21:48:12.7581851Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2568 2023-01-11T21:48:12.7582466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7582921Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7583488Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7583960Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7584540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7584968Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7585539Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7586014Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7586473Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7586952Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7587613Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7588311Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7588837Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7589291Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7589806Z [1673472302.931799] [7e0e28e30a97:2568 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7590405Z [1673472302.937253] [7e0e28e30a97:2568 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7590882Z [1673472302.937253] [7e0e28e30a97:2568 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7591421Z [1673472302.927409] [7e0e28e30a97:2567 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7591928Z [1673472302.933616] [7e0e28e30a97:2567 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7592396Z [1673472302.933616] [7e0e28e30a97:2567 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7592875Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7593347Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7593829Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7594311Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7594642Z ok (6.034s) 2023-01-11T21:48:12.7594795Z 2023-01-11T21:48:12.7595082Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7595417Z Ran 1 test in 6.034s 2023-01-11T21:48:12.7595587Z 2023-01-11T21:48:12.7595684Z OK 2023-01-11T21:48:12.7595801Z 2023-01-11T21:48:12.7595927Z Generating XML reports... 2023-01-11T21:48:12.7596538Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212458.xml 2023-01-11T21:48:12.7597249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7597687Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7598267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7598738Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7598970Z 2023-01-11T21:48:12.7599084Z Running tests... 2023-01-11T21:48:12.7599479Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7600013Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7600604Z test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7601144Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2685 2023-01-11T21:48:12.7601595Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2686 2023-01-11T21:48:12.7602205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7602660Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7603220Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7603696Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7604281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7604707Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7605280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7605746Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7606302Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7606782Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7607446Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7608187Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7608720Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7609175Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7609691Z [1673472311.449184] [7e0e28e30a97:2686 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7610213Z [1673472311.456103] [7e0e28e30a97:2686 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7610694Z [1673472311.456103] [7e0e28e30a97:2686 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7611187Z [1673472311.445858] [7e0e28e30a97:2685 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7611694Z [1673472311.451693] [7e0e28e30a97:2685 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7612164Z [1673472311.451693] [7e0e28e30a97:2685 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7612644Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7613321Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7613809Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7614291Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.7614622Z ok (5.528s) 2023-01-11T21:48:12.7614773Z 2023-01-11T21:48:12.7615057Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7615389Z Ran 1 test in 5.528s 2023-01-11T21:48:12.7615559Z 2023-01-11T21:48:12.7615656Z OK 2023-01-11T21:48:12.7615772Z 2023-01-11T21:48:12.7615900Z Generating XML reports... 2023-01-11T21:48:12.7616512Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212506.xml 2023-01-11T21:48:12.7617227Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7617663Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7618248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7618719Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7618950Z 2023-01-11T21:48:12.7619062Z Running tests... 2023-01-11T21:48:12.7619447Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7619983Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7620539Z test_DistributedDataParallel_non_default_stream (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7621614Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/76428 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.624s) 2023-01-11T21:48:12.7622238Z 2023-01-11T21:48:12.7622500Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7622833Z Ran 1 test in 1.624s 2023-01-11T21:48:12.7622998Z 2023-01-11T21:48:12.7623110Z OK (skipped=1) 2023-01-11T21:48:12.7623268Z 2023-01-11T21:48:12.7623397Z Generating XML reports... 2023-01-11T21:48:12.7624057Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212514.xml 2023-01-11T21:48:12.7624793Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7625250Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7625810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7626286Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7626521Z 2023-01-11T21:48:12.7626633Z Running tests... 2023-01-11T21:48:12.7627041Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7627555Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7628105Z test_DistributedDataParallel_requires_grad (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7628639Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2837 2023-01-11T21:48:12.7629070Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2838 2023-01-11T21:48:12.7629682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7630136Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7630713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7631175Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7631758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7632208Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7632766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7633240Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7633700Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7634199Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7634840Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7635532Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7636058Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7636535Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7636865Z ok (4.332s) 2023-01-11T21:48:12.7637016Z 2023-01-11T21:48:12.7637297Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7637629Z Ran 1 test in 4.333s 2023-01-11T21:48:12.7637794Z 2023-01-11T21:48:12.7637872Z OK 2023-01-11T21:48:12.7638009Z 2023-01-11T21:48:12.7638136Z Generating XML reports... 2023-01-11T21:48:12.7638745Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212519.xml 2023-01-11T21:48:12.7639462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7639964Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7640548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7641018Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7641250Z 2023-01-11T21:48:12.7641342Z Running tests... 2023-01-11T21:48:12.7641803Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7642348Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7642913Z test_DistributedDataParallel_with_amp_and_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7643969Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77294 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.591s) 2023-01-11T21:48:12.7644486Z 2023-01-11T21:48:12.7644757Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7645092Z Ran 1 test in 1.592s 2023-01-11T21:48:12.7645252Z 2023-01-11T21:48:12.7645362Z OK (skipped=1) 2023-01-11T21:48:12.7645519Z 2023-01-11T21:48:12.7645629Z Generating XML reports... 2023-01-11T21:48:12.7646240Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212525.xml 2023-01-11T21:48:12.7646953Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7647408Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7647968Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7648444Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7648674Z 2023-01-11T21:48:12.7648786Z Running tests... 2023-01-11T21:48:12.7649200Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7649732Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7650265Z test_DistributedSampler_padding (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7650778Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2974 2023-01-11T21:48:12.7651208Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2975 2023-01-11T21:48:12.7651813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7652267Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7652829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7653510Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7654095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7654541Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7655101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7655568Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7656024Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7656522Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7657166Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7657958Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7658487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7658945Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7659532Z [1673472334.612914] [7e0e28e30a97:2974 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7660053Z [1673472334.618917] [7e0e28e30a97:2974 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7660533Z [1673472334.618917] [7e0e28e30a97:2974 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7661031Z [1673472334.614701] [7e0e28e30a97:2975 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7661545Z [1673472334.620407] [7e0e28e30a97:2975 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7662017Z [1673472334.620407] [7e0e28e30a97:2975 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7662365Z ok (5.470s) 2023-01-11T21:48:12.7662518Z 2023-01-11T21:48:12.7662781Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7663110Z Ran 1 test in 5.470s 2023-01-11T21:48:12.7663275Z 2023-01-11T21:48:12.7663372Z OK 2023-01-11T21:48:12.7663508Z 2023-01-11T21:48:12.7663639Z Generating XML reports... 2023-01-11T21:48:12.7664228Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212529.xml 2023-01-11T21:48:12.7664947Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7665403Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7665961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7666432Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7666664Z 2023-01-11T21:48:12.7666780Z Running tests... 2023-01-11T21:48:12.7667190Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7667696Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7668199Z test_SyncBatchNorm_process_group (__main__.TestDistBackendWithSpawn) ... skip: no torchvision (0.002s) 2023-01-11T21:48:12.7668488Z 2023-01-11T21:48:12.7668754Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7669071Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7669235Z 2023-01-11T21:48:12.7669346Z OK (skipped=1) 2023-01-11T21:48:12.7669506Z 2023-01-11T21:48:12.7669633Z Generating XML reports... 2023-01-11T21:48:12.7670237Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212537.xml 2023-01-11T21:48:12.7670931Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7671384Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7671961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7672414Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7672645Z 2023-01-11T21:48:12.7672757Z Running tests... 2023-01-11T21:48:12.7673161Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7673761Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7674199Z test_accumulate_gradients_no_sync (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7674707Z Runs _test_accumulate_gradients_no_sync using default inputs ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T21:48:12.7675008Z 2023-01-11T21:48:12.7675330Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7675673Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7675818Z 2023-01-11T21:48:12.7675928Z OK (skipped=1) 2023-01-11T21:48:12.7676083Z 2023-01-11T21:48:12.7676210Z Generating XML reports... 2023-01-11T21:48:12.7676817Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212540.xml 2023-01-11T21:48:12.7677510Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7677970Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7678545Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7679022Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7679233Z 2023-01-11T21:48:12.7679345Z Running tests... 2023-01-11T21:48:12.7679754Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7680281Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7680741Z test_accumulate_gradients_no_sync_allreduce_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7681271Z Runs multiple iterations on _test_accumulate_gradients_no_sync ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T21:48:12.7681577Z 2023-01-11T21:48:12.7681843Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7682179Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7682323Z 2023-01-11T21:48:12.7682435Z OK (skipped=1) 2023-01-11T21:48:12.7682592Z 2023-01-11T21:48:12.7682720Z Generating XML reports... 2023-01-11T21:48:12.7683324Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212542.xml 2023-01-11T21:48:12.7684018Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7684475Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7685051Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7685522Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7685756Z 2023-01-11T21:48:12.7685847Z Running tests... 2023-01-11T21:48:12.7686257Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7686792Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7687269Z test_accumulate_gradients_no_sync_allreduce_with_then_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7687832Z Runs multiple iterations on _test_accumulate_gradients_no_sync using allreduce ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T21:48:12.7688159Z 2023-01-11T21:48:12.7688428Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7688763Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7688925Z 2023-01-11T21:48:12.7689017Z OK (skipped=1) 2023-01-11T21:48:12.7689173Z 2023-01-11T21:48:12.7689299Z Generating XML reports... 2023-01-11T21:48:12.7689905Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212545.xml 2023-01-11T21:48:12.7690618Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7691117Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7691700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7692171Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7692403Z 2023-01-11T21:48:12.7692492Z Running tests... 2023-01-11T21:48:12.7693152Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7693718Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7694198Z test_accumulate_gradients_no_sync_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T21:48:12.7694701Z Runs _test_accumulate_gradients_no_sync using default inputs ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T21:48:12.7695005Z 2023-01-11T21:48:12.7695276Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7695608Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7695775Z 2023-01-11T21:48:12.7695865Z OK (skipped=1) 2023-01-11T21:48:12.7696023Z 2023-01-11T21:48:12.7696150Z Generating XML reports... 2023-01-11T21:48:12.7696753Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212547.xml 2023-01-11T21:48:12.7697467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7697901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7698477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7698948Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7699178Z 2023-01-11T21:48:12.7699290Z Running tests... 2023-01-11T21:48:12.7699682Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7700214Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7700710Z test_all_gather (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7701173Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3253 2023-01-11T21:48:12.7701624Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3254 2023-01-11T21:48:12.7702233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7702684Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7703245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7703717Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7704308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7704738Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7705314Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7705782Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7706246Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7706723Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7707379Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7708074Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7708692Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7709263Z STAGE:2023-01-11 21:25:53 3254:3254 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7709748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7710376Z STAGE:2023-01-11 21:25:53 3253:3253 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7710891Z [1673472353.746061] [7e0e28e30a97:3254 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7711408Z [1673472354.772855] [7e0e28e30a97:3254 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7711886Z [1673472354.772855] [7e0e28e30a97:3254 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7712410Z [1673472353.724655] [7e0e28e30a97:3253 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7712893Z [1673472354.778731] [7e0e28e30a97:3253 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7713369Z [1673472354.778731] [7e0e28e30a97:3253 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7714172Z STAGE:2023-01-11 21:25:55 3254:3254 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:25:55 3253:3253 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7714564Z 2023-01-11T21:48:12.7714918Z STAGE:2023-01-11 21:25:55 3254:3254 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7715520Z STAGE:2023-01-11 21:25:55 3253:3253 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7716087Z STAGE:2023-01-11 21:25:55 3253:3253 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7716654Z STAGE:2023-01-11 21:25:55 3254:3254 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7717233Z STAGE:2023-01-11 21:25:55 3253:3253 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7717798Z STAGE:2023-01-11 21:25:55 3254:3254 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7718389Z STAGE:2023-01-11 21:25:55 3253:3253 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7718990Z STAGE:2023-01-11 21:25:55 3254:3254 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7719351Z ok (5.821s) 2023-01-11T21:48:12.7719502Z 2023-01-11T21:48:12.7719756Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7720088Z Ran 1 test in 5.821s 2023-01-11T21:48:12.7720256Z 2023-01-11T21:48:12.7720352Z OK 2023-01-11T21:48:12.7720487Z 2023-01-11T21:48:12.7720595Z Generating XML reports... 2023-01-11T21:48:12.7721204Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212549.xml 2023-01-11T21:48:12.7721920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7722375Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7722935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7723406Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7723637Z 2023-01-11T21:48:12.7723750Z Running tests... 2023-01-11T21:48:12.7724141Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7724671Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7725288Z test_all_gather_coalesced_complex (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T21:48:12.7725608Z 2023-01-11T21:48:12.7725877Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7726184Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7726347Z 2023-01-11T21:48:12.7726460Z OK (skipped=1) 2023-01-11T21:48:12.7726667Z 2023-01-11T21:48:12.7726801Z Generating XML reports... 2023-01-11T21:48:12.7727457Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212558.xml 2023-01-11T21:48:12.7728152Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7728603Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7729181Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7729638Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7729871Z 2023-01-11T21:48:12.7729982Z Running tests... 2023-01-11T21:48:12.7730389Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7730918Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7731444Z test_all_gather_coalesced_full_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T21:48:12.7731768Z 2023-01-11T21:48:12.7732033Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7732363Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7732527Z 2023-01-11T21:48:12.7732619Z OK (skipped=1) 2023-01-11T21:48:12.7732777Z 2023-01-11T21:48:12.7733164Z Generating XML reports... 2023-01-11T21:48:12.7733783Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212600.xml 2023-01-11T21:48:12.7734499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7734933Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7735512Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7735980Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7736212Z 2023-01-11T21:48:12.7736324Z Running tests... 2023-01-11T21:48:12.7736709Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7737239Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7737775Z test_all_gather_coalesced_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T21:48:12.7738094Z 2023-01-11T21:48:12.7738341Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7738670Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7738833Z 2023-01-11T21:48:12.7738943Z OK (skipped=1) 2023-01-11T21:48:12.7739101Z 2023-01-11T21:48:12.7739226Z Generating XML reports... 2023-01-11T21:48:12.7739813Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212602.xml 2023-01-11T21:48:12.7740528Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7740978Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7741540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7742017Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7742359Z 2023-01-11T21:48:12.7742469Z Running tests... 2023-01-11T21:48:12.7742881Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7743389Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7743929Z test_all_gather_coalesced_simple (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T21:48:12.7744247Z 2023-01-11T21:48:12.7744579Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7744918Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7745062Z 2023-01-11T21:48:12.7745174Z OK (skipped=1) 2023-01-11T21:48:12.7745329Z 2023-01-11T21:48:12.7745456Z Generating XML reports... 2023-01-11T21:48:12.7746062Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212605.xml 2023-01-11T21:48:12.7746755Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7746940Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7747323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7747517Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7747537Z 2023-01-11T21:48:12.7747648Z Running tests... 2023-01-11T21:48:12.7747914Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7748227Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7748523Z test_all_gather_coalesced_with_empty (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.003s) 2023-01-11T21:48:12.7748542Z 2023-01-11T21:48:12.7748786Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7748904Z Ran 1 test in 0.003s 2023-01-11T21:48:12.7748924Z 2023-01-11T21:48:12.7749034Z OK (skipped=1) 2023-01-11T21:48:12.7749053Z 2023-01-11T21:48:12.7749179Z Generating XML reports... 2023-01-11T21:48:12.7749648Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212607.xml 2023-01-11T21:48:12.7750024Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7750207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7750590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7750785Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7750805Z 2023-01-11T21:48:12.7750896Z Running tests... 2023-01-11T21:48:12.7751159Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7751475Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7751740Z test_all_gather_complex (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7751965Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3532 2023-01-11T21:48:12.7752181Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3533 2023-01-11T21:48:12.7752556Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7752734Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7753096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7753289Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7753660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7753901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7754283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7754478Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7754774Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7755021Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7755425Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7755807Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7756040Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7756377Z STAGE:2023-01-11 21:26:13 3533:3533 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7756607Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7756936Z STAGE:2023-01-11 21:26:13 3532:3532 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7757220Z [1673472373.937487] [7e0e28e30a97:3532 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7757457Z [1673472374.993580] [7e0e28e30a97:3532 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7757699Z [1673472374.993580] [7e0e28e30a97:3532 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7757978Z [1673472373.960423] [7e0e28e30a97:3533 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7758214Z [1673472374.991734] [7e0e28e30a97:3533 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7758435Z [1673472374.991734] [7e0e28e30a97:3533 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7758990Z STAGE:2023-01-11 21:26:15 3532:3532 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:26:15 3533:3533 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7759011Z 2023-01-11T21:48:12.7759361Z STAGE:2023-01-11 21:26:15 3533:3533 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7759708Z STAGE:2023-01-11 21:26:15 3532:3532 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7760030Z STAGE:2023-01-11 21:26:15 3532:3532 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7760354Z STAGE:2023-01-11 21:26:15 3533:3533 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7760685Z STAGE:2023-01-11 21:26:15 3532:3532 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7761017Z STAGE:2023-01-11 21:26:15 3533:3533 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7761363Z STAGE:2023-01-11 21:26:15 3532:3532 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7761706Z STAGE:2023-01-11 21:26:15 3533:3533 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7761794Z ok (5.852s) 2023-01-11T21:48:12.7761812Z 2023-01-11T21:48:12.7762079Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7762198Z Ran 1 test in 5.853s 2023-01-11T21:48:12.7762217Z 2023-01-11T21:48:12.7762313Z OK 2023-01-11T21:48:12.7762386Z 2023-01-11T21:48:12.7762517Z Generating XML reports... 2023-01-11T21:48:12.7762972Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212610.xml 2023-01-11T21:48:12.7763346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7763526Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7763935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7764136Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7764156Z 2023-01-11T21:48:12.7764266Z Running tests... 2023-01-11T21:48:12.7764536Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7764853Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7765123Z test_all_gather_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all gather (0.002s) 2023-01-11T21:48:12.7765143Z 2023-01-11T21:48:12.7765404Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7765521Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7765540Z 2023-01-11T21:48:12.7765651Z OK (skipped=1) 2023-01-11T21:48:12.7765670Z 2023-01-11T21:48:12.7765777Z Generating XML reports... 2023-01-11T21:48:12.7766235Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212618.xml 2023-01-11T21:48:12.7766607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7766786Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7767167Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7767363Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7767383Z 2023-01-11T21:48:12.7767498Z Running tests... 2023-01-11T21:48:12.7767766Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7768059Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7768340Z test_all_gather_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all gather (0.002s) 2023-01-11T21:48:12.7768360Z 2023-01-11T21:48:12.7768620Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7768734Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7768754Z 2023-01-11T21:48:12.7768864Z OK (skipped=1) 2023-01-11T21:48:12.7768883Z 2023-01-11T21:48:12.7769010Z Generating XML reports... 2023-01-11T21:48:12.7769456Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212620.xml 2023-01-11T21:48:12.7769834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7770015Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7770375Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7770570Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7770589Z 2023-01-11T21:48:12.7770705Z Running tests... 2023-01-11T21:48:12.7770969Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7771281Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7771551Z test_all_gather_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7771772Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3712 2023-01-11T21:48:12.7772046Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3713 2023-01-11T21:48:12.7772423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7772582Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7773158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7773425Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7773813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7773990Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7774369Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7774561Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7774813Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7775033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7775436Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7775837Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7776070Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7776309Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.7776543Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7776786Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.7777188Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.7777520Z STAGE:2023-01-11 21:26:27 3712:3712 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7777901Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.7778232Z STAGE:2023-01-11 21:26:27 3713:3713 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7778512Z [1673472387.191710] [7e0e28e30a97:3713 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7778749Z [1673472388.212161] [7e0e28e30a97:3713 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7778989Z [1673472388.212161] [7e0e28e30a97:3713 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7779268Z [1673472387.168157] [7e0e28e30a97:3712 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7779499Z [1673472388.230289] [7e0e28e30a97:3712 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7779740Z [1673472388.230289] [7e0e28e30a97:3712 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7780287Z STAGE:2023-01-11 21:26:28 3713:3713 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:26:28 3712:3712 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7780308Z 2023-01-11T21:48:12.7780874Z STAGE:2023-01-11 21:26:28 3713:3713 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:26:28 3712:3712 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7780965Z 2023-01-11T21:48:12.7781318Z STAGE:2023-01-11 21:26:28 3713:3713 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7781637Z STAGE:2023-01-11 21:26:28 3712:3712 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7781993Z STAGE:2023-01-11 21:26:28 3713:3713 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7782338Z STAGE:2023-01-11 21:26:28 3712:3712 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7782685Z STAGE:2023-01-11 21:26:28 3713:3713 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7783028Z STAGE:2023-01-11 21:26:28 3712:3712 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7783133Z ok (5.937s) 2023-01-11T21:48:12.7783152Z 2023-01-11T21:48:12.7783424Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7783540Z Ran 1 test in 5.937s 2023-01-11T21:48:12.7783560Z 2023-01-11T21:48:12.7783658Z OK 2023-01-11T21:48:12.7783677Z 2023-01-11T21:48:12.7783783Z Generating XML reports... 2023-01-11T21:48:12.7784235Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212623.xml 2023-01-11T21:48:12.7784611Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7784791Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7785173Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7785369Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7785389Z 2023-01-11T21:48:12.7785500Z Running tests... 2023-01-11T21:48:12.7785766Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7786090Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7786333Z test_all_gather_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7786554Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3826 2023-01-11T21:48:12.7786771Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3827 2023-01-11T21:48:12.7787146Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7787326Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7787707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7787903Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7788276Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7788432Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7788809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7789000Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7789252Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7789492Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7789893Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7790291Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7790585Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7790815Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7790956Z skip: Skipped due to small world size. (4.249s) 2023-01-11T21:48:12.7790977Z 2023-01-11T21:48:12.7791251Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7791410Z Ran 1 test in 4.249s 2023-01-11T21:48:12.7791431Z 2023-01-11T21:48:12.7791545Z OK (skipped=1) 2023-01-11T21:48:12.7791564Z 2023-01-11T21:48:12.7791690Z Generating XML reports... 2023-01-11T21:48:12.7792146Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212631.xml 2023-01-11T21:48:12.7792518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7792697Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7793085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7793261Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7793280Z 2023-01-11T21:48:12.7793392Z Running tests... 2023-01-11T21:48:12.7793659Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7793974Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7794275Z test_all_gather_into_cat_tensor_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_gather_into_tensor (0.002s) 2023-01-11T21:48:12.7794296Z 2023-01-11T21:48:12.7794555Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7794670Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7794689Z 2023-01-11T21:48:12.7794800Z OK (skipped=1) 2023-01-11T21:48:12.7794822Z 2023-01-11T21:48:12.7794949Z Generating XML reports... 2023-01-11T21:48:12.7795378Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212638.xml 2023-01-11T21:48:12.7795746Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7795924Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7796307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7796502Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7796522Z 2023-01-11T21:48:12.7796636Z Running tests... 2023-01-11T21:48:12.7796902Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7797212Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7797500Z test_all_gather_into_stack_tensor_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_gather_into_tensor (0.002s) 2023-01-11T21:48:12.7797543Z 2023-01-11T21:48:12.7797785Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7797900Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7797919Z 2023-01-11T21:48:12.7798030Z OK (skipped=1) 2023-01-11T21:48:12.7798049Z 2023-01-11T21:48:12.7798176Z Generating XML reports... 2023-01-11T21:48:12.7798624Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212640.xml 2023-01-11T21:48:12.7798994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7799173Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7799555Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7799789Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7799827Z 2023-01-11T21:48:12.7799918Z Running tests... 2023-01-11T21:48:12.7800188Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7800500Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7800835Z test_all_gather_multigpu (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl backend supports allgather multigpu (0.002s) 2023-01-11T21:48:12.7800856Z 2023-01-11T21:48:12.7801126Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7801240Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7801259Z 2023-01-11T21:48:12.7801368Z OK (skipped=1) 2023-01-11T21:48:12.7801387Z 2023-01-11T21:48:12.7801513Z Generating XML reports... 2023-01-11T21:48:12.7801937Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212643.xml 2023-01-11T21:48:12.7802312Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7802491Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7802873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7803070Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7803089Z 2023-01-11T21:48:12.7803201Z Running tests... 2023-01-11T21:48:12.7803466Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7803777Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7804080Z test_all_gather_multigpu_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl backend supports allgather multigpu (0.002s) 2023-01-11T21:48:12.7804104Z 2023-01-11T21:48:12.7804347Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7804461Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7804480Z 2023-01-11T21:48:12.7804591Z OK (skipped=1) 2023-01-11T21:48:12.7804610Z 2023-01-11T21:48:12.7804734Z Generating XML reports... 2023-01-11T21:48:12.7805181Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212645.xml 2023-01-11T21:48:12.7805560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7805738Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7806121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7806296Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7806335Z 2023-01-11T21:48:12.7806426Z Running tests... 2023-01-11T21:48:12.7806695Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7807009Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7807289Z test_all_gather_object_default_pg (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7807510Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4061 2023-01-11T21:48:12.7807725Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4062 2023-01-11T21:48:12.7808098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7808277Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7808637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7808831Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7809284Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7809463Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7809842Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7810114Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7810370Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7810611Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7811020Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7811399Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7811635Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7811866Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7812147Z [1673472411.850952] [7e0e28e30a97:4062 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7812384Z [1673472412.643795] [7e0e28e30a97:4062 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7812624Z [1673472412.643795] [7e0e28e30a97:4062 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7813086Z [1673472411.849056] [7e0e28e30a97:4061 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7813334Z [1673472412.640738] [7e0e28e30a97:4061 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7813574Z [1673472412.640738] [7e0e28e30a97:4061 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7813661Z ok (5.927s) 2023-01-11T21:48:12.7813703Z 2023-01-11T21:48:12.7813957Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7814073Z Ran 1 test in 5.927s 2023-01-11T21:48:12.7814093Z 2023-01-11T21:48:12.7814188Z OK 2023-01-11T21:48:12.7814208Z 2023-01-11T21:48:12.7814334Z Generating XML reports... 2023-01-11T21:48:12.7814785Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212647.xml 2023-01-11T21:48:12.7815157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7815337Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7815720Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7815896Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7815915Z 2023-01-11T21:48:12.7816026Z Running tests... 2023-01-11T21:48:12.7816290Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7816607Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7816881Z test_all_gather_object_subgroup (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7817103Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4172 2023-01-11T21:48:12.7817317Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4173 2023-01-11T21:48:12.7817690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7817934Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7818322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7818517Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7818943Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7819127Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7819508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7819699Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7819948Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7820192Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7820574Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7820973Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7821206Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7821435Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7821681Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.7821920Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.7822321Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.7822718Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.7822961Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:48:12.7823177Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:48:12.7823574Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.7823969Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.7824212Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T21:48:12.7824448Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T21:48:12.7824843Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T21:48:12.7825232Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T21:48:12.7825517Z [1673472420.346636] [7e0e28e30a97:4172 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7825756Z [1673472421.143800] [7e0e28e30a97:4172 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7825998Z [1673472421.143800] [7e0e28e30a97:4172 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7826256Z [1673472420.370183] [7e0e28e30a97:4173 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7826555Z [1673472421.164588] [7e0e28e30a97:4173 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7826791Z [1673472421.164588] [7e0e28e30a97:4173 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7826898Z ok (6.524s) 2023-01-11T21:48:12.7826919Z 2023-01-11T21:48:12.7827195Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7827361Z Ran 1 test in 6.524s 2023-01-11T21:48:12.7827382Z 2023-01-11T21:48:12.7827479Z OK 2023-01-11T21:48:12.7827498Z 2023-01-11T21:48:12.7827626Z Generating XML reports... 2023-01-11T21:48:12.7828081Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212656.xml 2023-01-11T21:48:12.7828434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7828617Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7828997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7829195Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7829214Z 2023-01-11T21:48:12.7829324Z Running tests... 2023-01-11T21:48:12.7829592Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7829907Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7830172Z test_all_gather_v_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports all_gather_v (0.002s) 2023-01-11T21:48:12.7830191Z 2023-01-11T21:48:12.7830454Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7830548Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7830567Z 2023-01-11T21:48:12.7830678Z OK (skipped=1) 2023-01-11T21:48:12.7830700Z 2023-01-11T21:48:12.7830828Z Generating XML reports... 2023-01-11T21:48:12.7831274Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212705.xml 2023-01-11T21:48:12.7831643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7831821Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7832211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7832405Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7832425Z 2023-01-11T21:48:12.7832516Z Running tests... 2023-01-11T21:48:12.7832780Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7833094Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7833520Z test_all_reduce_coalesced_full_group_max (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.7833541Z 2023-01-11T21:48:12.7833802Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7833916Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7833936Z 2023-01-11T21:48:12.7834047Z OK (skipped=1) 2023-01-11T21:48:12.7834066Z 2023-01-11T21:48:12.7834193Z Generating XML reports... 2023-01-11T21:48:12.7834640Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212707.xml 2023-01-11T21:48:12.7834992Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7835170Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7835549Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7835803Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7835822Z 2023-01-11T21:48:12.7835934Z Running tests... 2023-01-11T21:48:12.7836205Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7836517Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7836981Z test_all_reduce_coalesced_full_group_min (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.7837002Z 2023-01-11T21:48:12.7837272Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7837367Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7837387Z 2023-01-11T21:48:12.7837497Z OK (skipped=1) 2023-01-11T21:48:12.7837516Z 2023-01-11T21:48:12.7837641Z Generating XML reports... 2023-01-11T21:48:12.7838087Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212710.xml 2023-01-11T21:48:12.7838464Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7838645Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7839025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7839221Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7839241Z 2023-01-11T21:48:12.7839354Z Running tests... 2023-01-11T21:48:12.7839597Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7839911Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7840342Z test_all_reduce_coalesced_full_group_product (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.7840366Z 2023-01-11T21:48:12.7840631Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7840744Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7840763Z 2023-01-11T21:48:12.7840872Z OK (skipped=1) 2023-01-11T21:48:12.7840892Z 2023-01-11T21:48:12.7841017Z Generating XML reports... 2023-01-11T21:48:12.7841460Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212712.xml 2023-01-11T21:48:12.7841834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7841993Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7842375Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7842571Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7842590Z 2023-01-11T21:48:12.7842702Z Running tests... 2023-01-11T21:48:12.7842970Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7843283Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7843703Z test_all_reduce_coalesced_full_group_sum (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.7843723Z 2023-01-11T21:48:12.7843983Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7844077Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7844117Z 2023-01-11T21:48:12.7844206Z OK (skipped=1) 2023-01-11T21:48:12.7844224Z 2023-01-11T21:48:12.7844351Z Generating XML reports... 2023-01-11T21:48:12.7844793Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212714.xml 2023-01-11T21:48:12.7845163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7845406Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7845794Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7845989Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7846009Z 2023-01-11T21:48:12.7846120Z Running tests... 2023-01-11T21:48:12.7846408Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7846733Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7847147Z test_all_reduce_coalesced_group_max (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.7847167Z 2023-01-11T21:48:12.7847428Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7847541Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7847565Z 2023-01-11T21:48:12.7847675Z OK (skipped=1) 2023-01-11T21:48:12.7847695Z 2023-01-11T21:48:12.7847821Z Generating XML reports... 2023-01-11T21:48:12.7848270Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212717.xml 2023-01-11T21:48:12.7848645Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7848806Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7849188Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7849383Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7849403Z 2023-01-11T21:48:12.7849515Z Running tests... 2023-01-11T21:48:12.7849777Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7850116Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7850533Z test_all_reduce_coalesced_group_min (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.7850554Z 2023-01-11T21:48:12.7850818Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7850933Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7850952Z 2023-01-11T21:48:12.7851043Z OK (skipped=1) 2023-01-11T21:48:12.7851062Z 2023-01-11T21:48:12.7851191Z Generating XML reports... 2023-01-11T21:48:12.7851636Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212719.xml 2023-01-11T21:48:12.7852004Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7852184Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7852565Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7852762Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7852782Z 2023-01-11T21:48:12.7853066Z Running tests... 2023-01-11T21:48:12.7853323Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7853635Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7854056Z test_all_reduce_coalesced_group_product (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.7854076Z 2023-01-11T21:48:12.7854334Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7854448Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7854467Z 2023-01-11T21:48:12.7854577Z OK (skipped=1) 2023-01-11T21:48:12.7854597Z 2023-01-11T21:48:12.7854724Z Generating XML reports... 2023-01-11T21:48:12.7855168Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212722.xml 2023-01-11T21:48:12.7855634Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7855794Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7856180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7856432Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7856454Z 2023-01-11T21:48:12.7856570Z Running tests... 2023-01-11T21:48:12.7856840Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7857151Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7857563Z test_all_reduce_coalesced_group_sum (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.7857587Z 2023-01-11T21:48:12.7857849Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7857963Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7857983Z 2023-01-11T21:48:12.7858073Z OK (skipped=1) 2023-01-11T21:48:12.7858112Z 2023-01-11T21:48:12.7858219Z Generating XML reports... 2023-01-11T21:48:12.7858670Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212724.xml 2023-01-11T21:48:12.7859041Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7859220Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7859599Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7859794Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7859818Z 2023-01-11T21:48:12.7859929Z Running tests... 2023-01-11T21:48:12.7860193Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7860486Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7860883Z test_all_reduce_coalesced_max (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.7860903Z 2023-01-11T21:48:12.7861169Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7861284Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7861303Z 2023-01-11T21:48:12.7861414Z OK (skipped=1) 2023-01-11T21:48:12.7861433Z 2023-01-11T21:48:12.7861560Z Generating XML reports... 2023-01-11T21:48:12.7862006Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212726.xml 2023-01-11T21:48:12.7862377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7862560Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7862922Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7863116Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7863136Z 2023-01-11T21:48:12.7863247Z Running tests... 2023-01-11T21:48:12.7863511Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7863821Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7864125Z test_all_reduce_coalesced_max_complex_unsupported (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7864350Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4634 2023-01-11T21:48:12.7864563Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4635 2023-01-11T21:48:12.7864989Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7865170Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7865552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7865804Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7866179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7866358Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7866735Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7866929Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7867182Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7867404Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7867806Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7868205Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7868437Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7869183Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T21:48:12.7869303Z warnings.warn( 2023-01-11T21:48:12.7869534Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7870268Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T21:48:12.7870384Z warnings.warn( 2023-01-11T21:48:12.7870469Z ok (4.237s) 2023-01-11T21:48:12.7870508Z 2023-01-11T21:48:12.7870756Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7870871Z Ran 1 test in 4.237s 2023-01-11T21:48:12.7870891Z 2023-01-11T21:48:12.7870986Z OK 2023-01-11T21:48:12.7871005Z 2023-01-11T21:48:12.7871133Z Generating XML reports... 2023-01-11T21:48:12.7871582Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212729.xml 2023-01-11T21:48:12.7871953Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7872134Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7872514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7872689Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7872731Z 2023-01-11T21:48:12.7872823Z Running tests... 2023-01-11T21:48:12.7873088Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7873402Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7873803Z test_all_reduce_coalesced_min (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.7873823Z 2023-01-11T21:48:12.7874083Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7874271Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7874290Z 2023-01-11T21:48:12.7874402Z OK (skipped=1) 2023-01-11T21:48:12.7874421Z 2023-01-11T21:48:12.7874548Z Generating XML reports... 2023-01-11T21:48:12.7874984Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212735.xml 2023-01-11T21:48:12.7875400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7875583Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7875970Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7876164Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7876184Z 2023-01-11T21:48:12.7876295Z Running tests... 2023-01-11T21:48:12.7876562Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7876873Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7877262Z test_all_reduce_coalesced_product (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.7877302Z 2023-01-11T21:48:12.7877543Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7877663Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7877682Z 2023-01-11T21:48:12.7877792Z OK (skipped=1) 2023-01-11T21:48:12.7877811Z 2023-01-11T21:48:12.7877937Z Generating XML reports... 2023-01-11T21:48:12.7878384Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212738.xml 2023-01-11T21:48:12.7878754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7878936Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7879316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7879491Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7879530Z 2023-01-11T21:48:12.7879620Z Running tests... 2023-01-11T21:48:12.7879883Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7880199Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7880599Z test_all_reduce_coalesced_sum (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.7880619Z 2023-01-11T21:48:12.7880882Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7880997Z Ran 1 test in 0.002s 2023-01-11T21:48:12.7881017Z 2023-01-11T21:48:12.7881127Z OK (skipped=1) 2023-01-11T21:48:12.7881150Z 2023-01-11T21:48:12.7881276Z Generating XML reports... 2023-01-11T21:48:12.7881701Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212740.xml 2023-01-11T21:48:12.7882071Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7882249Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7882631Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7882825Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7882845Z 2023-01-11T21:48:12.7882956Z Running tests... 2023-01-11T21:48:12.7883218Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7883531Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7883880Z test_all_reduce_complex_unsupported_ops (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7884084Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4836 2023-01-11T21:48:12.7884298Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4837 2023-01-11T21:48:12.7884677Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7884902Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7885297Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7885492Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7885858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7886039Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7886398Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7886589Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7886836Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7887081Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7887481Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7887875Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7888109Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7888344Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7888449Z ok (4.237s) 2023-01-11T21:48:12.7888469Z 2023-01-11T21:48:12.7888719Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7888834Z Ran 1 test in 4.237s 2023-01-11T21:48:12.7888854Z 2023-01-11T21:48:12.7888949Z OK 2023-01-11T21:48:12.7888968Z 2023-01-11T21:48:12.7889094Z Generating XML reports... 2023-01-11T21:48:12.7889541Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212743.xml 2023-01-11T21:48:12.7889908Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7890085Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7890464Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7890642Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7890680Z 2023-01-11T21:48:12.7890772Z Running tests... 2023-01-11T21:48:12.7891038Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7891349Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7891622Z test_all_reduce_full_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7891844Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4939 2023-01-11T21:48:12.7892058Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4940 2023-01-11T21:48:12.7892431Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7892608Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7893150Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7893434Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7893812Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7893989Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7894426Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7894627Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7894878Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7895118Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7895517Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7895924Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7896157Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7896399Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.7896633Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7896877Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.7897254Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.7897587Z STAGE:2023-01-11 21:27:53 4940:4940 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7897985Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.7898314Z STAGE:2023-01-11 21:27:53 4939:4939 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7898594Z [1673472473.803999] [7e0e28e30a97:4940 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7898832Z [1673472474.840065] [7e0e28e30a97:4940 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7899074Z [1673472474.840065] [7e0e28e30a97:4940 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7899413Z STAGE:2023-01-11 21:27:55 4940:4940 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7899692Z [1673472473.802205] [7e0e28e30a97:4939 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7899926Z [1673472474.855644] [7e0e28e30a97:4939 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7900147Z [1673472474.855644] [7e0e28e30a97:4939 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7900485Z STAGE:2023-01-11 21:27:55 4939:4939 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7900834Z STAGE:2023-01-11 21:27:55 4940:4940 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7901179Z STAGE:2023-01-11 21:27:55 4939:4939 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7901504Z STAGE:2023-01-11 21:27:55 4940:4940 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7901824Z STAGE:2023-01-11 21:27:55 4939:4939 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7902222Z STAGE:2023-01-11 21:27:55 4940:4940 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7902551Z STAGE:2023-01-11 21:27:55 4939:4939 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7902891Z STAGE:2023-01-11 21:27:55 4940:4940 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7903260Z STAGE:2023-01-11 21:27:55 4939:4939 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7903371Z ok (5.830s) 2023-01-11T21:48:12.7903391Z 2023-01-11T21:48:12.7903663Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7903779Z Ran 1 test in 5.830s 2023-01-11T21:48:12.7903799Z 2023-01-11T21:48:12.7903893Z OK 2023-01-11T21:48:12.7903912Z 2023-01-11T21:48:12.7904039Z Generating XML reports... 2023-01-11T21:48:12.7904488Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212749.xml 2023-01-11T21:48:12.7904866Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7905026Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7905407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7905604Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7905624Z 2023-01-11T21:48:12.7905734Z Running tests... 2023-01-11T21:48:12.7906002Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7906315Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7906586Z test_all_reduce_full_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7906808Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5053 2023-01-11T21:48:12.7907025Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5054 2023-01-11T21:48:12.7907377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7907552Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7907940Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7908133Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7908500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7908677Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7909058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7909256Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7909485Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7909725Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7910127Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7910527Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7910760Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7910997Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.7911226Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7911530Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.7911934Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.7912312Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.7912689Z STAGE:2023-01-11 21:28:02 5054:5054 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7913025Z STAGE:2023-01-11 21:28:02 5053:5053 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7913310Z [1673472482.185519] [7e0e28e30a97:5054 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7913544Z [1673472483.208080] [7e0e28e30a97:5054 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7913788Z [1673472483.208080] [7e0e28e30a97:5054 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7914062Z [1673472482.164880] [7e0e28e30a97:5053 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7914297Z [1673472483.203987] [7e0e28e30a97:5053 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7914535Z [1673472483.203987] [7e0e28e30a97:5053 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7915084Z STAGE:2023-01-11 21:28:03 5054:5054 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:28:03 5053:5053 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7915105Z 2023-01-11T21:48:12.7915673Z STAGE:2023-01-11 21:28:03 5053:5053 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:28:03 5054:5054 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7915696Z 2023-01-11T21:48:12.7916022Z STAGE:2023-01-11 21:28:03 5054:5054 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7916324Z STAGE:2023-01-11 21:28:03 5053:5053 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7916655Z STAGE:2023-01-11 21:28:03 5054:5054 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7916990Z STAGE:2023-01-11 21:28:03 5053:5053 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7917335Z STAGE:2023-01-11 21:28:03 5054:5054 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7917678Z STAGE:2023-01-11 21:28:03 5053:5053 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7917781Z ok (5.831s) 2023-01-11T21:48:12.7917804Z 2023-01-11T21:48:12.7918073Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7918187Z Ran 1 test in 5.832s 2023-01-11T21:48:12.7918207Z 2023-01-11T21:48:12.7918283Z OK 2023-01-11T21:48:12.7918320Z 2023-01-11T21:48:12.7918427Z Generating XML reports... 2023-01-11T21:48:12.7918875Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212758.xml 2023-01-11T21:48:12.7919251Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7919430Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7919809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7920001Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7920020Z 2023-01-11T21:48:12.7920133Z Running tests... 2023-01-11T21:48:12.7920486Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7920779Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7921058Z test_all_reduce_full_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7921278Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5167 2023-01-11T21:48:12.7921540Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5168 2023-01-11T21:48:12.7921925Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7922104Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7922487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7922681Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7923033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7923209Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7923586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7923781Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7924030Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7924268Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7924669Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7925065Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7925299Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7925524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.7925753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7925993Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.7926389Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.7926720Z STAGE:2023-01-11 21:28:10 5168:5168 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7927112Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.7927445Z STAGE:2023-01-11 21:28:10 5167:5167 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7927724Z [1673472490.534289] [7e0e28e30a97:5168 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7927956Z [1673472491.581153] [7e0e28e30a97:5168 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7928198Z [1673472491.581153] [7e0e28e30a97:5168 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7928454Z [1673472490.532546] [7e0e28e30a97:5167 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7928687Z [1673472491.582360] [7e0e28e30a97:5167 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7928925Z [1673472491.582360] [7e0e28e30a97:5167 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7929543Z STAGE:2023-01-11 21:28:11 5168:5168 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:28:11 5167:5167 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7929565Z 2023-01-11T21:48:12.7929913Z STAGE:2023-01-11 21:28:11 5168:5168 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7930302Z STAGE:2023-01-11 21:28:11 5167:5167 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7930638Z STAGE:2023-01-11 21:28:11 5167:5167 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7930955Z STAGE:2023-01-11 21:28:11 5168:5168 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7931279Z STAGE:2023-01-11 21:28:11 5167:5167 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7931613Z STAGE:2023-01-11 21:28:11 5168:5168 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7931938Z STAGE:2023-01-11 21:28:11 5167:5167 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7932278Z STAGE:2023-01-11 21:28:11 5168:5168 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7932379Z ok (5.846s) 2023-01-11T21:48:12.7932398Z 2023-01-11T21:48:12.7932667Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7932781Z Ran 1 test in 5.846s 2023-01-11T21:48:12.7932801Z 2023-01-11T21:48:12.7933114Z OK 2023-01-11T21:48:12.7933132Z 2023-01-11T21:48:12.7933270Z Generating XML reports... 2023-01-11T21:48:12.7933729Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212806.xml 2023-01-11T21:48:12.7934083Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7934268Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7934650Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7934845Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7934865Z 2023-01-11T21:48:12.7934975Z Running tests... 2023-01-11T21:48:12.7935243Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7935554Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7935827Z test_all_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7936047Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5281 2023-01-11T21:48:12.7936241Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5282 2023-01-11T21:48:12.7936615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7936789Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7937172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7937364Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7937737Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7937915Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7938299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7938473Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7938718Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7939059Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7939466Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7939919Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7940162Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7940402Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.7940630Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7940874Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.7941264Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.7941659Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.7941991Z STAGE:2023-01-11 21:28:18 5281:5281 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7942317Z STAGE:2023-01-11 21:28:18 5282:5282 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7942599Z [1673472498.945394] [7e0e28e30a97:5282 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7942835Z [1673472499.984334] [7e0e28e30a97:5282 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7943075Z [1673472499.984334] [7e0e28e30a97:5282 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7943355Z [1673472498.942816] [7e0e28e30a97:5281 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.7943586Z [1673472499.985128] [7e0e28e30a97:5281 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.7943829Z [1673472499.985128] [7e0e28e30a97:5281 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.7944379Z STAGE:2023-01-11 21:28:20 5282:5282 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:28:20 5281:5281 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7944400Z 2023-01-11T21:48:12.7944728Z STAGE:2023-01-11 21:28:20 5282:5282 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7945073Z STAGE:2023-01-11 21:28:20 5281:5281 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7945401Z STAGE:2023-01-11 21:28:20 5282:5282 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7945722Z STAGE:2023-01-11 21:28:20 5281:5281 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.7946051Z STAGE:2023-01-11 21:28:20 5282:5282 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7946383Z STAGE:2023-01-11 21:28:20 5281:5281 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.7946724Z STAGE:2023-01-11 21:28:20 5282:5282 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7947063Z STAGE:2023-01-11 21:28:20 5281:5281 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.7947167Z ok (5.909s) 2023-01-11T21:48:12.7947187Z 2023-01-11T21:48:12.7947433Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7947608Z Ran 1 test in 5.909s 2023-01-11T21:48:12.7947627Z 2023-01-11T21:48:12.7947724Z OK 2023-01-11T21:48:12.7947742Z 2023-01-11T21:48:12.7947869Z Generating XML reports... 2023-01-11T21:48:12.7948324Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212815.xml 2023-01-11T21:48:12.7948700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7948926Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7949323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7949501Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7949539Z 2023-01-11T21:48:12.7949632Z Running tests... 2023-01-11T21:48:12.7949895Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7950212Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7950476Z test_all_reduce_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7950720Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5395 2023-01-11T21:48:12.7950932Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5396 2023-01-11T21:48:12.7951311Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7951489Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7951851Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7952045Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7952416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7952596Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7952974Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7953166Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7953417Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7953657Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7954039Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7954435Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7954666Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7954900Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7955061Z skip: Skipped due to small world size. (4.224s) 2023-01-11T21:48:12.7955081Z 2023-01-11T21:48:12.7955346Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7955461Z Ran 1 test in 4.224s 2023-01-11T21:48:12.7955480Z 2023-01-11T21:48:12.7955590Z OK (skipped=1) 2023-01-11T21:48:12.7955612Z 2023-01-11T21:48:12.7955738Z Generating XML reports... 2023-01-11T21:48:12.7956168Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212823.xml 2023-01-11T21:48:12.7956538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7956716Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7957093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7957377Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7957397Z 2023-01-11T21:48:12.7957507Z Running tests... 2023-01-11T21:48:12.7957778Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7958089Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7958401Z test_all_reduce_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7958609Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5498 2023-01-11T21:48:12.7958821Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5499 2023-01-11T21:48:12.7959201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7959384Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7959763Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7959957Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7960323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7960501Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7960859Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7961051Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7961297Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7961535Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7961937Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7962338Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7962569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7962802Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7962964Z skip: Skipped due to small world size. (4.235s) 2023-01-11T21:48:12.7962984Z 2023-01-11T21:48:12.7963230Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7963344Z Ran 1 test in 4.235s 2023-01-11T21:48:12.7963363Z 2023-01-11T21:48:12.7963473Z OK (skipped=1) 2023-01-11T21:48:12.7963492Z 2023-01-11T21:48:12.7963617Z Generating XML reports... 2023-01-11T21:48:12.7964067Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212830.xml 2023-01-11T21:48:12.7964439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7964616Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7964996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7965188Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7965208Z 2023-01-11T21:48:12.7965301Z Running tests... 2023-01-11T21:48:12.7965565Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7965877Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.7966150Z test_all_reduce_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.7966428Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5601 2023-01-11T21:48:12.7966642Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5602 2023-01-11T21:48:12.7967018Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7967194Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7967597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7967796Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7968169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7968345Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.7968724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.7968921Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.7969167Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.7969405Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.7969806Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7970181Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.7970411Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.7970639Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.7970804Z skip: Skipped due to small world size. (4.222s) 2023-01-11T21:48:12.7970824Z 2023-01-11T21:48:12.7971089Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.7971204Z Ran 1 test in 4.222s 2023-01-11T21:48:12.7971223Z 2023-01-11T21:48:12.7971332Z OK (skipped=1) 2023-01-11T21:48:12.7971351Z 2023-01-11T21:48:12.7971478Z Generating XML reports... 2023-01-11T21:48:12.7971923Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212836.xml 2023-01-11T21:48:12.7972273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.7972450Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8012759Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8013182Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8013215Z 2023-01-11T21:48:12.8013322Z Running tests... 2023-01-11T21:48:12.8013629Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8013960Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8014229Z test_all_reduce_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8014456Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5704 2023-01-11T21:48:12.8014676Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5705 2023-01-11T21:48:12.8015061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8015238Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8015633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8015981Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8016368Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8016540Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8017000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8017201Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8017447Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8017691Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8018107Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8018519Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8018750Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8018973Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8019128Z skip: Skipped due to small world size. (4.250s) 2023-01-11T21:48:12.8019149Z 2023-01-11T21:48:12.8019427Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8019534Z Ran 1 test in 4.251s 2023-01-11T21:48:12.8019554Z 2023-01-11T21:48:12.8019655Z OK (skipped=1) 2023-01-11T21:48:12.8019675Z 2023-01-11T21:48:12.8019791Z Generating XML reports... 2023-01-11T21:48:12.8020251Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212843.xml 2023-01-11T21:48:12.8020636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8020806Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8021196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8021384Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8021404Z 2023-01-11T21:48:12.8021503Z Running tests... 2023-01-11T21:48:12.8021779Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8022095Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8022346Z test_all_reduce_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8022564Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5807 2023-01-11T21:48:12.8022779Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5808 2023-01-11T21:48:12.8023161Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8023335Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8023729Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8023920Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8024302Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8024472Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8024865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8025054Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8025358Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8025599Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8026013Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8026466Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8026705Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8027049Z STAGE:2023-01-11 21:28:54 5807:5807 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8027278Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8027652Z STAGE:2023-01-11 21:28:54 5808:5808 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8027942Z [1673472534.414030] [7e0e28e30a97:5808 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8028169Z [1673472535.434362] [7e0e28e30a97:5808 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8028415Z [1673472535.434362] [7e0e28e30a97:5808 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8028693Z [1673472534.391158] [7e0e28e30a97:5807 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8028924Z [1673472535.434929] [7e0e28e30a97:5807 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8029163Z [1673472535.434929] [7e0e28e30a97:5807 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8029732Z STAGE:2023-01-11 21:28:55 5808:5808 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:28:55 5807:5807 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8029753Z 2023-01-11T21:48:12.8030109Z STAGE:2023-01-11 21:28:55 5808:5808 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8030464Z STAGE:2023-01-11 21:28:55 5807:5807 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8030797Z STAGE:2023-01-11 21:28:55 5808:5808 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8031127Z STAGE:2023-01-11 21:28:55 5807:5807 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8031466Z STAGE:2023-01-11 21:28:55 5808:5808 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8031796Z STAGE:2023-01-11 21:28:55 5807:5807 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8032153Z STAGE:2023-01-11 21:28:55 5808:5808 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8032502Z STAGE:2023-01-11 21:28:55 5807:5807 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8032604Z ok (5.829s) 2023-01-11T21:48:12.8032624Z 2023-01-11T21:48:12.8032895Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8033009Z Ran 1 test in 5.829s 2023-01-11T21:48:12.8033029Z 2023-01-11T21:48:12.8033125Z OK 2023-01-11T21:48:12.8033144Z 2023-01-11T21:48:12.8033261Z Generating XML reports... 2023-01-11T21:48:12.8033713Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212850.xml 2023-01-11T21:48:12.8034099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8034288Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8034770Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8034966Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8034986Z 2023-01-11T21:48:12.8035096Z Running tests... 2023-01-11T21:48:12.8035377Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8035742Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8035997Z test_all_reduce_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8036211Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5921 2023-01-11T21:48:12.8036428Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5922 2023-01-11T21:48:12.8036817Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8036993Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8037381Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8037577Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8037957Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8038139Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8038521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8038709Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8038962Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8039208Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8039624Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8040031Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8040270Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8040608Z STAGE:2023-01-11 21:29:02 5921:5921 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8040843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8041170Z STAGE:2023-01-11 21:29:02 5922:5922 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8041463Z [1673472542.723327] [7e0e28e30a97:5921 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8041699Z [1673472543.757079] [7e0e28e30a97:5921 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8041946Z [1673472543.757079] [7e0e28e30a97:5921 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8042227Z [1673472542.746070] [7e0e28e30a97:5922 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8042457Z [1673472543.779625] [7e0e28e30a97:5922 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8042705Z [1673472543.779625] [7e0e28e30a97:5922 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8043275Z STAGE:2023-01-11 21:29:04 5921:5921 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:29:04 5922:5922 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8043351Z 2023-01-11T21:48:12.8043721Z STAGE:2023-01-11 21:29:04 5922:5922 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8044083Z STAGE:2023-01-11 21:29:04 5921:5921 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8044465Z STAGE:2023-01-11 21:29:04 5922:5922 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8044802Z STAGE:2023-01-11 21:29:04 5921:5921 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8045152Z STAGE:2023-01-11 21:29:04 5922:5922 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8045487Z STAGE:2023-01-11 21:29:04 5921:5921 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8045849Z STAGE:2023-01-11 21:29:04 5922:5922 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8046206Z STAGE:2023-01-11 21:29:04 5921:5921 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8046307Z ok (5.831s) 2023-01-11T21:48:12.8046328Z 2023-01-11T21:48:12.8046595Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8046707Z Ran 1 test in 5.831s 2023-01-11T21:48:12.8046727Z 2023-01-11T21:48:12.8046803Z OK 2023-01-11T21:48:12.8046830Z 2023-01-11T21:48:12.8046944Z Generating XML reports... 2023-01-11T21:48:12.8047414Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212858.xml 2023-01-11T21:48:12.8047794Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8047975Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8048366Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8048569Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8048589Z 2023-01-11T21:48:12.8048690Z Running tests... 2023-01-11T21:48:12.8048973Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8049285Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8049551Z test_all_reduce_multigpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8049768Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6035 2023-01-11T21:48:12.8049995Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6036 2023-01-11T21:48:12.8050379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8050558Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8050955Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8051181Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8051557Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8051732Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8052133Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8052334Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8052581Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8052831Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8053542Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8053952Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8054189Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8054499Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8054863Z STAGE:2023-01-11 21:29:11 6036:6036 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8055672Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T21:48:12.8055790Z warnings.warn( 2023-01-11T21:48:12.8056131Z STAGE:2023-01-11 21:29:12 6035:6035 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8056930Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T21:48:12.8057049Z warnings.warn( 2023-01-11T21:48:12.8057334Z [1673472552.104514] [7e0e28e30a97:6035 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8057577Z [1673472552.114147] [7e0e28e30a97:6035 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8057819Z [1673472552.114147] [7e0e28e30a97:6035 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8058101Z [1673472552.105709] [7e0e28e30a97:6036 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8058326Z [1673472552.114883] [7e0e28e30a97:6036 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8058573Z [1673472552.114883] [7e0e28e30a97:6036 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8059157Z STAGE:2023-01-11 21:29:12 6035:6035 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:29:12 6036:6036 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8059179Z 2023-01-11T21:48:12.8059765Z STAGE:2023-01-11 21:29:12 6036:6036 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:29:12 6035:6035 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8059789Z 2023-01-11T21:48:12.8060130Z STAGE:2023-01-11 21:29:12 6035:6035 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8060463Z STAGE:2023-01-11 21:29:12 6036:6036 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8060809Z STAGE:2023-01-11 21:29:12 6035:6035 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8061174Z STAGE:2023-01-11 21:29:12 6035:6035 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8061511Z STAGE:2023-01-11 21:29:12 6036:6036 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8061870Z STAGE:2023-01-11 21:29:12 6036:6036 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8061957Z ok (5.938s) 2023-01-11T21:48:12.8061983Z 2023-01-11T21:48:12.8062246Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8062363Z Ran 1 test in 5.938s 2023-01-11T21:48:12.8062457Z 2023-01-11T21:48:12.8062555Z OK 2023-01-11T21:48:12.8062575Z 2023-01-11T21:48:12.8062697Z Generating XML reports... 2023-01-11T21:48:12.8063176Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212907.xml 2023-01-11T21:48:12.8063557Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8063787Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8064206Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8064390Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8064410Z 2023-01-11T21:48:12.8064510Z Running tests... 2023-01-11T21:48:12.8064790Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8065108Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8065395Z test_all_reduce_multigpu_complex (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8065615Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6153 2023-01-11T21:48:12.8065841Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6154 2023-01-11T21:48:12.8066238Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8066402Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8066797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8066993Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8067374Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8067561Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8067962Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8068148Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8068401Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8068649Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8069052Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8069466Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8069694Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8069937Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8070272Z STAGE:2023-01-11 21:29:20 6154:6154 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8071078Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T21:48:12.8071194Z warnings.warn( 2023-01-11T21:48:12.8071527Z STAGE:2023-01-11 21:29:20 6153:6153 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8072326Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T21:48:12.8072495Z warnings.warn( 2023-01-11T21:48:12.8072769Z [1673472560.609642] [7e0e28e30a97:6153 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8073012Z [1673472560.619684] [7e0e28e30a97:6153 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8073299Z [1673472560.619684] [7e0e28e30a97:6153 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8073593Z [1673472560.609615] [7e0e28e30a97:6154 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8073827Z [1673472560.619671] [7e0e28e30a97:6154 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8074075Z [1673472560.619671] [7e0e28e30a97:6154 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8074650Z STAGE:2023-01-11 21:29:21 6153:6153 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:29:21 6154:6154 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8074671Z 2023-01-11T21:48:12.8075036Z STAGE:2023-01-11 21:29:21 6154:6154 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8075403Z STAGE:2023-01-11 21:29:21 6153:6153 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8075737Z STAGE:2023-01-11 21:29:21 6154:6154 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8076062Z STAGE:2023-01-11 21:29:21 6153:6153 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8076411Z STAGE:2023-01-11 21:29:21 6154:6154 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8076766Z STAGE:2023-01-11 21:29:21 6154:6154 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8077116Z STAGE:2023-01-11 21:29:21 6153:6153 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8077469Z STAGE:2023-01-11 21:29:21 6153:6153 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8077571Z ok (5.924s) 2023-01-11T21:48:12.8077591Z 2023-01-11T21:48:12.8077866Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8077981Z Ran 1 test in 5.925s 2023-01-11T21:48:12.8078001Z 2023-01-11T21:48:12.8078094Z OK 2023-01-11T21:48:12.8078114Z 2023-01-11T21:48:12.8078225Z Generating XML reports... 2023-01-11T21:48:12.8078689Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212915.xml 2023-01-11T21:48:12.8079080Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8079268Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8079661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8079850Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8079870Z 2023-01-11T21:48:12.8079979Z Running tests... 2023-01-11T21:48:12.8080262Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8080574Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8080837Z test_all_reduce_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8081062Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6271 2023-01-11T21:48:12.8081281Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6272 2023-01-11T21:48:12.8081741Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8081915Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8082317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8082505Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8082942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8083117Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8083513Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8083710Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8083966Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8084214Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8084630Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8085047Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8085282Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8085627Z STAGE:2023-01-11 21:29:27 6271:6271 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8085850Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8086185Z STAGE:2023-01-11 21:29:28 6272:6272 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8086479Z [1673472568.077125] [7e0e28e30a97:6271 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8086710Z [1673472569.127967] [7e0e28e30a97:6271 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8086957Z [1673472569.127967] [7e0e28e30a97:6271 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8087251Z [1673472568.100430] [7e0e28e30a97:6272 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8087485Z [1673472569.133107] [7e0e28e30a97:6272 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8087733Z [1673472569.133107] [7e0e28e30a97:6272 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8088299Z STAGE:2023-01-11 21:29:29 6271:6271 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:29:29 6272:6272 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8088323Z 2023-01-11T21:48:12.8088686Z STAGE:2023-01-11 21:29:29 6272:6272 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8089035Z STAGE:2023-01-11 21:29:29 6271:6271 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8089370Z STAGE:2023-01-11 21:29:29 6271:6271 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8089713Z STAGE:2023-01-11 21:29:29 6272:6272 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8090062Z STAGE:2023-01-11 21:29:29 6271:6271 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8090399Z STAGE:2023-01-11 21:29:29 6272:6272 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8090816Z STAGE:2023-01-11 21:29:29 6271:6271 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8091151Z STAGE:2023-01-11 21:29:29 6272:6272 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8091243Z ok (5.831s) 2023-01-11T21:48:12.8091262Z 2023-01-11T21:48:12.8091532Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8091630Z Ran 1 test in 5.832s 2023-01-11T21:48:12.8091650Z 2023-01-11T21:48:12.8091783Z OK 2023-01-11T21:48:12.8091805Z 2023-01-11T21:48:12.8091928Z Generating XML reports... 2023-01-11T21:48:12.8092396Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212924.xml 2023-01-11T21:48:12.8092780Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8093198Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8093616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8093806Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8093826Z 2023-01-11T21:48:12.8093935Z Running tests... 2023-01-11T21:48:12.8094196Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8094515Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8094790Z test_all_reduce_result_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8095019Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6385 2023-01-11T21:48:12.8095234Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6386 2023-01-11T21:48:12.8095623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8095802Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8096200Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8096381Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8096771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8096944Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8097343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8097533Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8097784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8098037Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8098452Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8098866Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8099090Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8099322Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8099614Z [1673472577.228478] [7e0e28e30a97:6385 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8099845Z [1673472577.234933] [7e0e28e30a97:6385 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8100095Z [1673472577.234933] [7e0e28e30a97:6385 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8100494Z [1673472577.237894] [7e0e28e30a97:6386 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8100722Z [1673472577.242721] [7e0e28e30a97:6386 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8101025Z [1673472577.242721] [7e0e28e30a97:6386 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8101128Z ok (5.638s) 2023-01-11T21:48:12.8101149Z 2023-01-11T21:48:12.8101443Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8101542Z Ran 1 test in 5.638s 2023-01-11T21:48:12.8101562Z 2023-01-11T21:48:12.8101645Z OK 2023-01-11T21:48:12.8101665Z 2023-01-11T21:48:12.8101791Z Generating XML reports... 2023-01-11T21:48:12.8102251Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212932.xml 2023-01-11T21:48:12.8102646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8102831Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8103226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8103425Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8103444Z 2023-01-11T21:48:12.8103539Z Running tests... 2023-01-11T21:48:12.8103821Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8104141Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8104399Z test_all_reduce_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8104621Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6499 2023-01-11T21:48:12.8104847Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6500 2023-01-11T21:48:12.8105243Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8105424Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8105814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8106002Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8106391Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8106573Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8106963Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8107162Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8107416Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8107661Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8108084Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8108484Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8108726Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8108966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8109303Z STAGE:2023-01-11 21:29:44 6499:6499 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8109720Z STAGE:2023-01-11 21:29:44 6500:6500 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8110014Z [1673472584.610816] [7e0e28e30a97:6499 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8110293Z [1673472585.655603] [7e0e28e30a97:6499 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8110549Z [1673472585.655603] [7e0e28e30a97:6499 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8110842Z [1673472584.610841] [7e0e28e30a97:6500 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8111085Z [1673472585.668697] [7e0e28e30a97:6500 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8111326Z [1673472585.668697] [7e0e28e30a97:6500 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8111896Z STAGE:2023-01-11 21:29:46 6499:6499 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:29:46 6500:6500 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8111917Z 2023-01-11T21:48:12.8112284Z STAGE:2023-01-11 21:29:46 6500:6500 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8112650Z STAGE:2023-01-11 21:29:46 6499:6499 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8112990Z STAGE:2023-01-11 21:29:46 6500:6500 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8113322Z STAGE:2023-01-11 21:29:46 6499:6499 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8113672Z STAGE:2023-01-11 21:29:46 6500:6500 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8114024Z STAGE:2023-01-11 21:29:46 6499:6499 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8114392Z STAGE:2023-01-11 21:29:46 6500:6500 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8114742Z STAGE:2023-01-11 21:29:46 6499:6499 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8114832Z ok (5.783s) 2023-01-11T21:48:12.8114851Z 2023-01-11T21:48:12.8115131Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8115247Z Ran 1 test in 5.783s 2023-01-11T21:48:12.8115267Z 2023-01-11T21:48:12.8115351Z OK 2023-01-11T21:48:12.8115370Z 2023-01-11T21:48:12.8115497Z Generating XML reports... 2023-01-11T21:48:12.8115968Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212940.xml 2023-01-11T21:48:12.8116369Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8116546Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8116930Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8117125Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8117144Z 2023-01-11T21:48:12.8117259Z Running tests... 2023-01-11T21:48:12.8117532Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8117857Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8118130Z test_all_reduce_sum_async (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8118360Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6613 2023-01-11T21:48:12.8118642Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6614 2023-01-11T21:48:12.8119038Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8119203Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8119607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8119844Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8120246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8120429Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8120822Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8121018Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8121275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8121512Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8121926Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8122340Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8122581Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8122917Z STAGE:2023-01-11 21:29:52 6613:6613 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8123153Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8123496Z STAGE:2023-01-11 21:29:52 6614:6614 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8123781Z [1673472592.980169] [7e0e28e30a97:6614 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8124016Z [1673472594.021019] [7e0e28e30a97:6614 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8124253Z [1673472594.021019] [7e0e28e30a97:6614 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8124544Z [1673472592.959860] [7e0e28e30a97:6613 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8124777Z [1673472594.004389] [7e0e28e30a97:6613 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8125029Z [1673472594.004389] [7e0e28e30a97:6613 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8125612Z STAGE:2023-01-11 21:29:54 6614:6614 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:29:54 6613:6613 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8125632Z 2023-01-11T21:48:12.8125989Z STAGE:2023-01-11 21:29:54 6614:6614 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8126352Z STAGE:2023-01-11 21:29:54 6613:6613 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8126692Z STAGE:2023-01-11 21:29:54 6613:6613 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8127035Z STAGE:2023-01-11 21:29:54 6614:6614 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8127376Z STAGE:2023-01-11 21:29:54 6613:6613 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8127721Z STAGE:2023-01-11 21:29:54 6614:6614 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8128135Z STAGE:2023-01-11 21:29:54 6613:6613 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8128496Z STAGE:2023-01-11 21:29:54 6614:6614 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8128588Z ok (5.822s) 2023-01-11T21:48:12.8128608Z 2023-01-11T21:48:12.8128965Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8129088Z Ran 1 test in 5.822s 2023-01-11T21:48:12.8129108Z 2023-01-11T21:48:12.8129192Z OK 2023-01-11T21:48:12.8129211Z 2023-01-11T21:48:12.8129338Z Generating XML reports... 2023-01-11T21:48:12.8129805Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212949.xml 2023-01-11T21:48:12.8130180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8130366Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8130768Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8130969Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8130989Z 2023-01-11T21:48:12.8131105Z Running tests... 2023-01-11T21:48:12.8131379Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8131708Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8131987Z test_all_reduce_sum_complex (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8132203Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6727 2023-01-11T21:48:12.8132416Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6728 2023-01-11T21:48:12.8132805Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8133256Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8133658Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8133834Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8134239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8134440Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8134831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8135014Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8135267Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8135522Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8135942Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8136345Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8136586Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8136933Z STAGE:2023-01-11 21:30:01 6727:6727 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8137170Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8137505Z STAGE:2023-01-11 21:30:01 6728:6728 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8137779Z [1673472601.348521] [7e0e28e30a97:6728 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8138115Z [1673472602.410508] [7e0e28e30a97:6728 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8138366Z [1673472602.410508] [7e0e28e30a97:6728 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8138709Z [1673472601.346764] [7e0e28e30a97:6727 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8138947Z [1673472602.407023] [7e0e28e30a97:6727 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8139196Z [1673472602.407023] [7e0e28e30a97:6727 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8139774Z STAGE:2023-01-11 21:30:02 6728:6728 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:30:02 6727:6727 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8139799Z 2023-01-11T21:48:12.8140384Z STAGE:2023-01-11 21:30:02 6728:6728 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:30:02 6727:6727 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8140405Z 2023-01-11T21:48:12.8140752Z STAGE:2023-01-11 21:30:02 6728:6728 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8141097Z STAGE:2023-01-11 21:30:02 6727:6727 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8141447Z STAGE:2023-01-11 21:30:02 6728:6728 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8141784Z STAGE:2023-01-11 21:30:02 6727:6727 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8142131Z STAGE:2023-01-11 21:30:02 6728:6728 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8142493Z STAGE:2023-01-11 21:30:02 6727:6727 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8142586Z ok (5.835s) 2023-01-11T21:48:12.8142606Z 2023-01-11T21:48:12.8142886Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8143001Z Ran 1 test in 5.835s 2023-01-11T21:48:12.8143021Z 2023-01-11T21:48:12.8143114Z OK 2023-01-11T21:48:12.8143138Z 2023-01-11T21:48:12.8143258Z Generating XML reports... 2023-01-11T21:48:12.8143729Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212957.xml 2023-01-11T21:48:12.8144104Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8144278Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8144681Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8144883Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8144903Z 2023-01-11T21:48:12.8145015Z Running tests... 2023-01-11T21:48:12.8145287Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8145613Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8145934Z test_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and NCCL backends will have CUDA allReduce tested (0.002s) 2023-01-11T21:48:12.8145955Z 2023-01-11T21:48:12.8146224Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8146323Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8146342Z 2023-01-11T21:48:12.8146451Z OK (skipped=1) 2023-01-11T21:48:12.8146471Z 2023-01-11T21:48:12.8146596Z Generating XML reports... 2023-01-11T21:48:12.8147126Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213005.xml 2023-01-11T21:48:12.8147517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8147703Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8148142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8148348Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8148368Z 2023-01-11T21:48:12.8148480Z Running tests... 2023-01-11T21:48:12.8148749Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8149077Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8149392Z test_all_reduce_sum_cuda_async (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and NCCL backends will have CUDA allReduce tested (0.002s) 2023-01-11T21:48:12.8149417Z 2023-01-11T21:48:12.8149694Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8149806Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8149826Z 2023-01-11T21:48:12.8149927Z OK (skipped=1) 2023-01-11T21:48:12.8149947Z 2023-01-11T21:48:12.8150073Z Generating XML reports... 2023-01-11T21:48:12.8150547Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213008.xml 2023-01-11T21:48:12.8150928Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8151096Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8151495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8151720Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8151744Z 2023-01-11T21:48:12.8151845Z Running tests... 2023-01-11T21:48:12.8152125Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8152452Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8152781Z test_all_reduce_sum_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and NCCL backends will have CUDA allReduce tested (0.002s) 2023-01-11T21:48:12.8152802Z 2023-01-11T21:48:12.8153072Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8153171Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8153206Z 2023-01-11T21:48:12.8153300Z OK (skipped=1) 2023-01-11T21:48:12.8153319Z 2023-01-11T21:48:12.8153447Z Generating XML reports... 2023-01-11T21:48:12.8153912Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213010.xml 2023-01-11T21:48:12.8154298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8154479Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8154879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8155069Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8155089Z 2023-01-11T21:48:12.8155201Z Running tests... 2023-01-11T21:48:12.8155466Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8155784Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8156035Z test_all_to_all (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T21:48:12.8156055Z 2023-01-11T21:48:12.8156336Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8156515Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8156535Z 2023-01-11T21:48:12.8156636Z OK (skipped=1) 2023-01-11T21:48:12.8156655Z 2023-01-11T21:48:12.8156778Z Generating XML reports... 2023-01-11T21:48:12.8157253Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213013.xml 2023-01-11T21:48:12.8157689Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8157865Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8158268Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8158468Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8158489Z 2023-01-11T21:48:12.8158591Z Running tests... 2023-01-11T21:48:12.8158867Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8159195Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8159454Z test_all_to_all_complex (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T21:48:12.8159474Z 2023-01-11T21:48:12.8159750Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8159848Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8159888Z 2023-01-11T21:48:12.8159985Z OK (skipped=1) 2023-01-11T21:48:12.8160004Z 2023-01-11T21:48:12.8160122Z Generating XML reports... 2023-01-11T21:48:12.8160589Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213015.xml 2023-01-11T21:48:12.8160979Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8161153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8161554Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8161752Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8161772Z 2023-01-11T21:48:12.8161884Z Running tests... 2023-01-11T21:48:12.8162146Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8162478Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8162749Z test_all_to_all_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL supports CUDA all_to_all (0.002s) 2023-01-11T21:48:12.8162770Z 2023-01-11T21:48:12.8163048Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8163162Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8163182Z 2023-01-11T21:48:12.8163293Z OK (skipped=1) 2023-01-11T21:48:12.8163313Z 2023-01-11T21:48:12.8163441Z Generating XML reports... 2023-01-11T21:48:12.8163915Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213017.xml 2023-01-11T21:48:12.8164303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8164470Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8164874Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8165073Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8165093Z 2023-01-11T21:48:12.8165203Z Running tests... 2023-01-11T21:48:12.8165482Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8165812Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8166089Z test_all_to_all_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL supports CUDA all_to_all (0.002s) 2023-01-11T21:48:12.8166176Z 2023-01-11T21:48:12.8166461Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8166576Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8166596Z 2023-01-11T21:48:12.8166689Z OK (skipped=1) 2023-01-11T21:48:12.8166709Z 2023-01-11T21:48:12.8166839Z Generating XML reports... 2023-01-11T21:48:12.8167357Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213020.xml 2023-01-11T21:48:12.8167758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8167942Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8168330Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8168525Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8168549Z 2023-01-11T21:48:12.8168660Z Running tests... 2023-01-11T21:48:12.8168924Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8169250Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8169513Z test_all_to_all_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T21:48:12.8169532Z 2023-01-11T21:48:12.8169810Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8169916Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8169935Z 2023-01-11T21:48:12.8170043Z OK (skipped=1) 2023-01-11T21:48:12.8170062Z 2023-01-11T21:48:12.8170191Z Generating XML reports... 2023-01-11T21:48:12.8170646Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213022.xml 2023-01-11T21:48:12.8171033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8171202Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8171592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8171789Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8171808Z 2023-01-11T21:48:12.8171919Z Running tests... 2023-01-11T21:48:12.8172191Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8172518Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8172795Z test_all_to_all_full_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL supports CUDA all_to_all (0.002s) 2023-01-11T21:48:12.8172815Z 2023-01-11T21:48:12.8173316Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8173423Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8173447Z 2023-01-11T21:48:12.8173543Z OK (skipped=1) 2023-01-11T21:48:12.8173562Z 2023-01-11T21:48:12.8173689Z Generating XML reports... 2023-01-11T21:48:12.8174152Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213025.xml 2023-01-11T21:48:12.8174540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8174727Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8175121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8175319Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8175339Z 2023-01-11T21:48:12.8175440Z Running tests... 2023-01-11T21:48:12.8175704Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8176029Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8176368Z test_all_to_all_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T21:48:12.8176388Z 2023-01-11T21:48:12.8176670Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8176785Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8176805Z 2023-01-11T21:48:12.8176914Z OK (skipped=1) 2023-01-11T21:48:12.8176933Z 2023-01-11T21:48:12.8177114Z Generating XML reports... 2023-01-11T21:48:12.8177595Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213027.xml 2023-01-11T21:48:12.8177984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8178149Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8178548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8178750Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8178770Z 2023-01-11T21:48:12.8178882Z Running tests... 2023-01-11T21:48:12.8179164Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8179496Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8179792Z test_all_to_all_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:48:12.8179812Z 2023-01-11T21:48:12.8180087Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8180204Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8180223Z 2023-01-11T21:48:12.8180317Z OK (skipped=1) 2023-01-11T21:48:12.8180336Z 2023-01-11T21:48:12.8180463Z Generating XML reports... 2023-01-11T21:48:12.8180929Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213029.xml 2023-01-11T21:48:12.8181321Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8181505Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8181908Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8182109Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8182129Z 2023-01-11T21:48:12.8182242Z Running tests... 2023-01-11T21:48:12.8182522Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8182835Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8183135Z test_all_to_all_single_equal_split (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:48:12.8183159Z 2023-01-11T21:48:12.8183436Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8183550Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8183570Z 2023-01-11T21:48:12.8183680Z OK (skipped=1) 2023-01-11T21:48:12.8183699Z 2023-01-11T21:48:12.8183826Z Generating XML reports... 2023-01-11T21:48:12.8184291Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213032.xml 2023-01-11T21:48:12.8184684Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8184851Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8185251Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8185450Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8185525Z 2023-01-11T21:48:12.8185641Z Running tests... 2023-01-11T21:48:12.8185929Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8186257Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8186571Z test_all_to_all_single_equal_split_complex (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:48:12.8186591Z 2023-01-11T21:48:12.8186917Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8187041Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8187061Z 2023-01-11T21:48:12.8187154Z OK (skipped=1) 2023-01-11T21:48:12.8187190Z 2023-01-11T21:48:12.8187300Z Generating XML reports... 2023-01-11T21:48:12.8187771Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213034.xml 2023-01-11T21:48:12.8188164Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8188352Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8188752Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8188951Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8188970Z 2023-01-11T21:48:12.8189082Z Running tests... 2023-01-11T21:48:12.8189367Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8189678Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8189990Z test_all_to_all_single_equal_split_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:48:12.8190010Z 2023-01-11T21:48:12.8190283Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8190401Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8190424Z 2023-01-11T21:48:12.8190537Z OK (skipped=1) 2023-01-11T21:48:12.8190557Z 2023-01-11T21:48:12.8190683Z Generating XML reports... 2023-01-11T21:48:12.8191152Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213037.xml 2023-01-11T21:48:12.8191544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8191731Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8192115Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8192313Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8192332Z 2023-01-11T21:48:12.8192443Z Running tests... 2023-01-11T21:48:12.8192722Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8193055Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8193377Z test_all_to_all_single_equal_split_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:48:12.8193398Z 2023-01-11T21:48:12.8193674Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8193789Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8193810Z 2023-01-11T21:48:12.8193923Z OK (skipped=1) 2023-01-11T21:48:12.8193943Z 2023-01-11T21:48:12.8194054Z Generating XML reports... 2023-01-11T21:48:12.8194520Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213039.xml 2023-01-11T21:48:12.8194909Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8195094Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8195563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8195763Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8195783Z 2023-01-11T21:48:12.8195895Z Running tests... 2023-01-11T21:48:12.8196176Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8196534Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8196855Z test_all_to_all_single_equal_split_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:48:12.8196875Z 2023-01-11T21:48:12.8197157Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8197274Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8197295Z 2023-01-11T21:48:12.8197405Z OK (skipped=1) 2023-01-11T21:48:12.8197424Z 2023-01-11T21:48:12.8197552Z Generating XML reports... 2023-01-11T21:48:12.8198026Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213041.xml 2023-01-11T21:48:12.8198423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8198605Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8198994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8199194Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8199214Z 2023-01-11T21:48:12.8199325Z Running tests... 2023-01-11T21:48:12.8199609Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8199937Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8200262Z test_all_to_all_single_equal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:48:12.8200286Z 2023-01-11T21:48:12.8200565Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8200681Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8200701Z 2023-01-11T21:48:12.8200810Z OK (skipped=1) 2023-01-11T21:48:12.8200830Z 2023-01-11T21:48:12.8200939Z Generating XML reports... 2023-01-11T21:48:12.8201407Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213044.xml 2023-01-11T21:48:12.8201801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8201983Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8202385Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8202583Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8202607Z 2023-01-11T21:48:12.8202719Z Running tests... 2023-01-11T21:48:12.8203000Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8203327Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8203622Z test_all_to_all_single_equal_split_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:48:12.8203643Z 2023-01-11T21:48:12.8203919Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8204035Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8204054Z 2023-01-11T21:48:12.8204163Z OK (skipped=1) 2023-01-11T21:48:12.8204182Z 2023-01-11T21:48:12.8204311Z Generating XML reports... 2023-01-11T21:48:12.8204780Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213046.xml 2023-01-11T21:48:12.8205241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8205426Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8205829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8206011Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8206031Z 2023-01-11T21:48:12.8206191Z Running tests... 2023-01-11T21:48:12.8206484Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8206815Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8207134Z test_all_to_all_single_equal_split_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:48:12.8207155Z 2023-01-11T21:48:12.8207435Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8207556Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8207576Z 2023-01-11T21:48:12.8207687Z OK (skipped=1) 2023-01-11T21:48:12.8207707Z 2023-01-11T21:48:12.8207835Z Generating XML reports... 2023-01-11T21:48:12.8208287Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213048.xml 2023-01-11T21:48:12.8208682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8208867Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8209267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8209467Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8209487Z 2023-01-11T21:48:12.8209600Z Running tests... 2023-01-11T21:48:12.8209882Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8210216Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8210503Z test_all_to_all_single_unequal_split (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:48:12.8210541Z 2023-01-11T21:48:12.8210801Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8210921Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8210940Z 2023-01-11T21:48:12.8211051Z OK (skipped=1) 2023-01-11T21:48:12.8211071Z 2023-01-11T21:48:12.8211199Z Generating XML reports... 2023-01-11T21:48:12.8211663Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213051.xml 2023-01-11T21:48:12.8212053Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8212236Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8212641Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8212822Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8213092Z 2023-01-11T21:48:12.8213196Z Running tests... 2023-01-11T21:48:12.8213485Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8213818Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8214133Z test_all_to_all_single_unequal_split_complex (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:48:12.8214154Z 2023-01-11T21:48:12.8214437Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8214554Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8214574Z 2023-01-11T21:48:12.8214685Z OK (skipped=1) 2023-01-11T21:48:12.8214782Z 2023-01-11T21:48:12.8214918Z Generating XML reports... 2023-01-11T21:48:12.8215377Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213053.xml 2023-01-11T21:48:12.8215771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8215957Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8216420Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8216629Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8216649Z 2023-01-11T21:48:12.8216762Z Running tests... 2023-01-11T21:48:12.8217050Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8217381Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8217681Z test_all_to_all_single_unequal_split_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:48:12.8217719Z 2023-01-11T21:48:12.8217982Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8218098Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8218117Z 2023-01-11T21:48:12.8218230Z OK (skipped=1) 2023-01-11T21:48:12.8218249Z 2023-01-11T21:48:12.8218377Z Generating XML reports... 2023-01-11T21:48:12.8218852Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213056.xml 2023-01-11T21:48:12.8219243Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8219426Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8219827Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8220013Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8220051Z 2023-01-11T21:48:12.8220145Z Running tests... 2023-01-11T21:48:12.8220427Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8220753Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8221086Z test_all_to_all_single_unequal_split_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:48:12.8221107Z 2023-01-11T21:48:12.8221384Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8221503Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8221522Z 2023-01-11T21:48:12.8221632Z OK (skipped=1) 2023-01-11T21:48:12.8221651Z 2023-01-11T21:48:12.8221779Z Generating XML reports... 2023-01-11T21:48:12.8222229Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213058.xml 2023-01-11T21:48:12.8222620Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8222805Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8223206Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8223409Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8223428Z 2023-01-11T21:48:12.8223540Z Running tests... 2023-01-11T21:48:12.8223821Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8224148Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8224468Z test_all_to_all_single_unequal_split_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:48:12.8224545Z 2023-01-11T21:48:12.8224813Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8224932Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8224951Z 2023-01-11T21:48:12.8225062Z OK (skipped=1) 2023-01-11T21:48:12.8225081Z 2023-01-11T21:48:12.8225212Z Generating XML reports... 2023-01-11T21:48:12.8225677Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213100.xml 2023-01-11T21:48:12.8226119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8226311Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8226721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8226921Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8226941Z 2023-01-11T21:48:12.8227040Z Running tests... 2023-01-11T21:48:12.8227322Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8227648Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8227976Z test_all_to_all_single_unequal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:48:12.8227996Z 2023-01-11T21:48:12.8228277Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8228394Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8228413Z 2023-01-11T21:48:12.8228525Z OK (skipped=1) 2023-01-11T21:48:12.8228544Z 2023-01-11T21:48:12.8228672Z Generating XML reports... 2023-01-11T21:48:12.8229137Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213103.xml 2023-01-11T21:48:12.8229509Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8229697Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8230100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8230301Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8230321Z 2023-01-11T21:48:12.8230434Z Running tests... 2023-01-11T21:48:12.8230718Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8231045Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8231356Z test_all_to_all_single_unequal_split_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T21:48:12.8231376Z 2023-01-11T21:48:12.8231654Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8231753Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8231776Z 2023-01-11T21:48:12.8231884Z OK (skipped=1) 2023-01-11T21:48:12.8231903Z 2023-01-11T21:48:12.8232032Z Generating XML reports... 2023-01-11T21:48:12.8232501Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213105.xml 2023-01-11T21:48:12.8232892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8233081Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8233483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8233682Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8233702Z 2023-01-11T21:48:12.8233796Z Running tests... 2023-01-11T21:48:12.8234078Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8234488Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8234812Z test_all_to_all_single_unequal_split_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T21:48:12.8234833Z 2023-01-11T21:48:12.8235108Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8235225Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8235245Z 2023-01-11T21:48:12.8235404Z OK (skipped=1) 2023-01-11T21:48:12.8235425Z 2023-01-11T21:48:12.8235558Z Generating XML reports... 2023-01-11T21:48:12.8236031Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213107.xml 2023-01-11T21:48:12.8236403Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8236586Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8236995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8237197Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8237216Z 2023-01-11T21:48:12.8237328Z Running tests... 2023-01-11T21:48:12.8237608Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8237940Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8238212Z test_average_parameters (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8238440Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7732 2023-01-11T21:48:12.8238651Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7733 2023-01-11T21:48:12.8239046Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8239238Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8239640Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8239840Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8240228Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8240413Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8240815Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8240995Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8241250Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8241505Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8241932Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8242347Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8242589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8242832Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8243127Z [1673472675.485737] [7e0e28e30a97:7732 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8243367Z [1673472675.491946] [7e0e28e30a97:7732 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8243601Z [1673472675.491946] [7e0e28e30a97:7732 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8243949Z [1673472675.495002] [7e0e28e30a97:7733 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8244192Z [1673472675.500290] [7e0e28e30a97:7733 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8244485Z [1673472675.500290] [7e0e28e30a97:7733 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8244743Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.8244996Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.8245426Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8245846Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8245952Z ok (6.142s) 2023-01-11T21:48:12.8245971Z 2023-01-11T21:48:12.8246254Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8246354Z Ran 1 test in 6.142s 2023-01-11T21:48:12.8246374Z 2023-01-11T21:48:12.8246468Z OK 2023-01-11T21:48:12.8246488Z 2023-01-11T21:48:12.8246616Z Generating XML reports... 2023-01-11T21:48:12.8247087Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213110.xml 2023-01-11T21:48:12.8247481Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8247666Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8248069Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8248272Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8248292Z 2023-01-11T21:48:12.8248403Z Running tests... 2023-01-11T21:48:12.8248667Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8248996Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8249274Z test_backend_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8249505Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7856 2023-01-11T21:48:12.8249733Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7857 2023-01-11T21:48:12.8250126Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8250309Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8250712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8250895Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8251288Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8251471Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8251873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8252097Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8252354Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8252609Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8253177Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8253705Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8253932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8254174Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8254393Z skip: Need at least 3 CUDA devices (4.226s) 2023-01-11T21:48:12.8254416Z 2023-01-11T21:48:12.8254713Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8254830Z Ran 1 test in 4.226s 2023-01-11T21:48:12.8254849Z 2023-01-11T21:48:12.8254961Z OK (skipped=1) 2023-01-11T21:48:12.8254981Z 2023-01-11T21:48:12.8255109Z Generating XML reports... 2023-01-11T21:48:12.8255579Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213119.xml 2023-01-11T21:48:12.8255958Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8256143Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8256544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8256746Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8256766Z 2023-01-11T21:48:12.8256878Z Running tests... 2023-01-11T21:48:12.8257161Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8257490Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8257752Z test_backend_group (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 3 (0.002s) 2023-01-11T21:48:12.8257773Z 2023-01-11T21:48:12.8258053Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8258155Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8258174Z 2023-01-11T21:48:12.8258285Z OK (skipped=1) 2023-01-11T21:48:12.8258304Z 2023-01-11T21:48:12.8258430Z Generating XML reports... 2023-01-11T21:48:12.8258903Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213125.xml 2023-01-11T21:48:12.8259298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8259484Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8259891Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8260092Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8260112Z 2023-01-11T21:48:12.8260225Z Running tests... 2023-01-11T21:48:12.8260493Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8260819Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8261083Z test_barrier (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support CPU barrier (0.002s) 2023-01-11T21:48:12.8261102Z 2023-01-11T21:48:12.8261381Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8261496Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8261520Z 2023-01-11T21:48:12.8261631Z OK (skipped=1) 2023-01-11T21:48:12.8261651Z 2023-01-11T21:48:12.8261780Z Generating XML reports... 2023-01-11T21:48:12.8262252Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213128.xml 2023-01-11T21:48:12.8262625Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8262811Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8263281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8263480Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8263500Z 2023-01-11T21:48:12.8263611Z Running tests... 2023-01-11T21:48:12.8263894Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8264272Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8264539Z test_barrier_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8264769Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8025 2023-01-11T21:48:12.8264981Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8026 2023-01-11T21:48:12.8265379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8265566Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8265969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8266166Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8266559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8266744Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8267141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8267321Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8267577Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8267837Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8268259Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8268675Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8268920Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8269162Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8269454Z [1673472695.197474] [7e0e28e30a97:8026 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8269698Z [1673472695.203572] [7e0e28e30a97:8026 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8269953Z [1673472695.203572] [7e0e28e30a97:8026 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8270224Z [1673472695.193692] [7e0e28e30a97:8025 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8270467Z [1673472695.200366] [7e0e28e30a97:8025 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8270724Z [1673472695.200366] [7e0e28e30a97:8025 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8270832Z ok (6.249s) 2023-01-11T21:48:12.8270852Z 2023-01-11T21:48:12.8271140Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8271257Z Ran 1 test in 6.249s 2023-01-11T21:48:12.8271277Z 2023-01-11T21:48:12.8271370Z OK 2023-01-11T21:48:12.8271389Z 2023-01-11T21:48:12.8271518Z Generating XML reports... 2023-01-11T21:48:12.8272058Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213130.xml 2023-01-11T21:48:12.8272432Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8272617Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8273066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8273271Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8273291Z 2023-01-11T21:48:12.8273403Z Running tests... 2023-01-11T21:48:12.8273687Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8274018Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8274297Z test_barrier_full_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support CPU barrier (0.002s) 2023-01-11T21:48:12.8274321Z 2023-01-11T21:48:12.8274598Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8274696Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8274716Z 2023-01-11T21:48:12.8274825Z OK (skipped=1) 2023-01-11T21:48:12.8274844Z 2023-01-11T21:48:12.8274972Z Generating XML reports... 2023-01-11T21:48:12.8275445Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213139.xml 2023-01-11T21:48:12.8275838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8276023Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8276427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8276629Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8276652Z 2023-01-11T21:48:12.8276746Z Running tests... 2023-01-11T21:48:12.8277027Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8277356Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8277636Z test_barrier_full_group_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8277867Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8172 2023-01-11T21:48:12.8278099Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8173 2023-01-11T21:48:12.8278490Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8278674Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8279079Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8279264Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8279652Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8279835Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8280238Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8280437Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8280692Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8280948Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8281369Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8281868Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8282110Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8282352Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8282519Z skip: Skipped due to small world size. (4.221s) 2023-01-11T21:48:12.8282539Z 2023-01-11T21:48:12.8282869Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8282992Z Ran 1 test in 4.222s 2023-01-11T21:48:12.8283012Z 2023-01-11T21:48:12.8283126Z OK (skipped=1) 2023-01-11T21:48:12.8283145Z 2023-01-11T21:48:12.8283274Z Generating XML reports... 2023-01-11T21:48:12.8283749Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213141.xml 2023-01-11T21:48:12.8284124Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8284312Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8284714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8284915Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8284935Z 2023-01-11T21:48:12.8285048Z Running tests... 2023-01-11T21:48:12.8285336Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8285666Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8285942Z test_barrier_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support CPU barrier (0.002s) 2023-01-11T21:48:12.8285962Z 2023-01-11T21:48:12.8286240Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8286339Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8286361Z 2023-01-11T21:48:12.8286471Z OK (skipped=1) 2023-01-11T21:48:12.8286491Z 2023-01-11T21:48:12.8286619Z Generating XML reports... 2023-01-11T21:48:12.8287089Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213148.xml 2023-01-11T21:48:12.8287478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8287666Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8288069Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8288269Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8288289Z 2023-01-11T21:48:12.8288399Z Running tests... 2023-01-11T21:48:12.8288662Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8288996Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8289267Z test_barrier_group_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8289497Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8308 2023-01-11T21:48:12.8289727Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8309 2023-01-11T21:48:12.8290121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8290307Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8290708Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8290887Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8291274Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8291519Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8291927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8292127Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8292428Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8292689Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8293334Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8293759Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8293983Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8294227Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8294394Z skip: Skipped due to small world size. (4.206s) 2023-01-11T21:48:12.8294415Z 2023-01-11T21:48:12.8294703Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8294818Z Ran 1 test in 4.206s 2023-01-11T21:48:12.8294838Z 2023-01-11T21:48:12.8294948Z OK (skipped=1) 2023-01-11T21:48:12.8294972Z 2023-01-11T21:48:12.8295101Z Generating XML reports... 2023-01-11T21:48:12.8295569Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213150.xml 2023-01-11T21:48:12.8295943Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8296130Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8296534Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8296739Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8296759Z 2023-01-11T21:48:12.8296870Z Running tests... 2023-01-11T21:48:12.8297153Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8297481Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8297780Z test_barrier_timeout_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only gloo backend supports timeouts (0.002s) 2023-01-11T21:48:12.8297800Z 2023-01-11T21:48:12.8298081Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8298179Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8298198Z 2023-01-11T21:48:12.8298310Z OK (skipped=1) 2023-01-11T21:48:12.8298329Z 2023-01-11T21:48:12.8298455Z Generating XML reports... 2023-01-11T21:48:12.8298933Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213157.xml 2023-01-11T21:48:12.8299326Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8299510Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8299912Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8300113Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8300133Z 2023-01-11T21:48:12.8300244Z Running tests... 2023-01-11T21:48:12.8300507Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8300832Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8301122Z test_barrier_timeout_global (__main__.TestDistBackendWithSpawn) ... skip: Only gloo backend supports timeouts (0.002s) 2023-01-11T21:48:12.8301231Z 2023-01-11T21:48:12.8301525Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8301642Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8301662Z 2023-01-11T21:48:12.8301774Z OK (skipped=1) 2023-01-11T21:48:12.8301794Z 2023-01-11T21:48:12.8301923Z Generating XML reports... 2023-01-11T21:48:12.8302454Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213200.xml 2023-01-11T21:48:12.8302862Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8303031Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8303434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8303632Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8303657Z 2023-01-11T21:48:12.8303768Z Running tests... 2023-01-11T21:48:12.8304050Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8304382Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8304671Z test_barrier_timeout_group (__main__.TestDistBackendWithSpawn) ... skip: Only gloo backend supports timeouts (0.002s) 2023-01-11T21:48:12.8304691Z 2023-01-11T21:48:12.8304973Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8305071Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8305108Z 2023-01-11T21:48:12.8305202Z OK (skipped=1) 2023-01-11T21:48:12.8305221Z 2023-01-11T21:48:12.8305348Z Generating XML reports... 2023-01-11T21:48:12.8305821Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213202.xml 2023-01-11T21:48:12.8306214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8306404Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8306807Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8307006Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8307026Z 2023-01-11T21:48:12.8307138Z Running tests... 2023-01-11T21:48:12.8307406Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8307734Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8308004Z test_batch_isend_irecv_gloo (__main__.TestDistBackendWithSpawn) ... skip: GLOO Batch Send Recv CPU (0.002s) 2023-01-11T21:48:12.8308024Z 2023-01-11T21:48:12.8308300Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8308417Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8308440Z 2023-01-11T21:48:12.8308553Z OK (skipped=1) 2023-01-11T21:48:12.8308572Z 2023-01-11T21:48:12.8308700Z Generating XML reports... 2023-01-11T21:48:12.8309170Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213204.xml 2023-01-11T21:48:12.8309562Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8309733Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8310137Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8310338Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8310358Z 2023-01-11T21:48:12.8310471Z Running tests... 2023-01-11T21:48:12.8310756Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8311088Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8311425Z test_batch_isend_irecv_gloo_tags (__main__.TestDistBackendWithSpawn) ... skip: GLOO Batch Send Recv CPU (0.002s) 2023-01-11T21:48:12.8311446Z 2023-01-11T21:48:12.8311727Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8311824Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8311862Z 2023-01-11T21:48:12.8311957Z OK (skipped=1) 2023-01-11T21:48:12.8311976Z 2023-01-11T21:48:12.8312156Z Generating XML reports... 2023-01-11T21:48:12.8312637Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213207.xml 2023-01-11T21:48:12.8313030Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8313215Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8313613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8313818Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8313838Z 2023-01-11T21:48:12.8313950Z Running tests... 2023-01-11T21:48:12.8314212Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8314541Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8314835Z test_batch_isend_irecv_mixed_backend_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:48:12.8314855Z 2023-01-11T21:48:12.8315137Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8315251Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8315271Z 2023-01-11T21:48:12.8315384Z OK (skipped=1) 2023-01-11T21:48:12.8315403Z 2023-01-11T21:48:12.8315531Z Generating XML reports... 2023-01-11T21:48:12.8315996Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213209.xml 2023-01-11T21:48:12.8316392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8316558Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8316958Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8317163Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8317183Z 2023-01-11T21:48:12.8317296Z Running tests... 2023-01-11T21:48:12.8317577Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8317909Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8318179Z test_batch_isend_irecv_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.003s) 2023-01-11T21:48:12.8318203Z 2023-01-11T21:48:12.8318480Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8318595Z Ran 1 test in 0.003s 2023-01-11T21:48:12.8318615Z 2023-01-11T21:48:12.8318709Z OK (skipped=1) 2023-01-11T21:48:12.8318728Z 2023-01-11T21:48:12.8318857Z Generating XML reports... 2023-01-11T21:48:12.8319323Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213211.xml 2023-01-11T21:48:12.8319714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8319900Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8320303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8320504Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8320524Z 2023-01-11T21:48:12.8320697Z Running tests... 2023-01-11T21:48:12.8320964Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8321294Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8321584Z test_batch_isend_irecv_no_rank_zero_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:48:12.8321604Z 2023-01-11T21:48:12.8321933Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8322056Z Ran 1 test in 0.003s 2023-01-11T21:48:12.8322075Z 2023-01-11T21:48:12.8322185Z OK (skipped=1) 2023-01-11T21:48:12.8322204Z 2023-01-11T21:48:12.8322332Z Generating XML reports... 2023-01-11T21:48:12.8322807Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213214.xml 2023-01-11T21:48:12.8323198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8323370Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8323772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8323974Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8323993Z 2023-01-11T21:48:12.8324104Z Running tests... 2023-01-11T21:48:12.8324387Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8324715Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8324988Z test_batch_isend_irecv_op_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:48:12.8325008Z 2023-01-11T21:48:12.8325285Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8325399Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8325418Z 2023-01-11T21:48:12.8325514Z OK (skipped=1) 2023-01-11T21:48:12.8325534Z 2023-01-11T21:48:12.8325661Z Generating XML reports... 2023-01-11T21:48:12.8326130Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213216.xml 2023-01-11T21:48:12.8326523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8326709Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8327112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8327314Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8327334Z 2023-01-11T21:48:12.8327445Z Running tests... 2023-01-11T21:48:12.8327709Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8328038Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8328357Z test_batch_isend_irecv_op_list_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:48:12.8328377Z 2023-01-11T21:48:12.8328658Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8328774Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8328794Z 2023-01-11T21:48:12.8328907Z OK (skipped=1) 2023-01-11T21:48:12.8328926Z 2023-01-11T21:48:12.8329051Z Generating XML reports... 2023-01-11T21:48:12.8329523Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213219.xml 2023-01-11T21:48:12.8329918Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8330085Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8330487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8330750Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8330769Z 2023-01-11T21:48:12.8330883Z Running tests... 2023-01-11T21:48:12.8331168Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8331495Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8331832Z test_batch_isend_irecv_ring_exchange_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:48:12.8331854Z 2023-01-11T21:48:12.8332138Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8332255Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8332274Z 2023-01-11T21:48:12.8332367Z OK (skipped=1) 2023-01-11T21:48:12.8332386Z 2023-01-11T21:48:12.8332515Z Generating XML reports... 2023-01-11T21:48:12.8333207Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213221.xml 2023-01-11T21:48:12.8333615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8333801Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8334204Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8334408Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8334429Z 2023-01-11T21:48:12.8334541Z Running tests... 2023-01-11T21:48:12.8334822Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8335138Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8335415Z test_batch_isend_irecv_self_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:48:12.8335435Z 2023-01-11T21:48:12.8335717Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8335834Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8335853Z 2023-01-11T21:48:12.8335965Z OK (skipped=1) 2023-01-11T21:48:12.8335985Z 2023-01-11T21:48:12.8336112Z Generating XML reports... 2023-01-11T21:48:12.8336582Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213223.xml 2023-01-11T21:48:12.8336976Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8337144Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8337549Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8337752Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8337771Z 2023-01-11T21:48:12.8337883Z Running tests... 2023-01-11T21:48:12.8338170Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8338498Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8338775Z test_batch_isend_irecv_tensor_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T21:48:12.8338795Z 2023-01-11T21:48:12.8339070Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8339188Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8339208Z 2023-01-11T21:48:12.8339306Z OK (skipped=1) 2023-01-11T21:48:12.8339340Z 2023-01-11T21:48:12.8339452Z Generating XML reports... 2023-01-11T21:48:12.8339916Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213226.xml 2023-01-11T21:48:12.8340307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8340490Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8340993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8341193Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8341215Z 2023-01-11T21:48:12.8341325Z Running tests... 2023-01-11T21:48:12.8341606Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8341982Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8342247Z test_broadcast (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8342478Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8840 2023-01-11T21:48:12.8342705Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8841 2023-01-11T21:48:12.8343104Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8343292Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8343690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8343889Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8344263Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8344449Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8344849Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8345047Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8345301Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8345561Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8345983Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8346400Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8346646Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8346978Z STAGE:2023-01-11 21:32:32 8841:8841 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8347216Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8347561Z STAGE:2023-01-11 21:32:32 8840:8840 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8347853Z [1673472752.549418] [7e0e28e30a97:8840 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8348098Z [1673472753.607802] [7e0e28e30a97:8840 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8348353Z [1673472753.607802] [7e0e28e30a97:8840 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8348648Z [1673472752.549676] [7e0e28e30a97:8841 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8348892Z [1673472753.607803] [7e0e28e30a97:8841 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8349143Z [1673472753.607803] [7e0e28e30a97:8841 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8349716Z STAGE:2023-01-11 21:32:33 8840:8840 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:32:33 8841:8841 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8349792Z 2023-01-11T21:48:12.8350173Z STAGE:2023-01-11 21:32:33 8841:8841 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8350519Z STAGE:2023-01-11 21:32:33 8840:8840 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8350908Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8351278Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8351641Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8351983Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8352332Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8352726Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8353068Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8353394Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8353750Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8354098Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8354461Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8354821Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8355162Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8355511Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8355863Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8356209Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8356557Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8356921Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8357262Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8357605Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8357956Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8358311Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8358672Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8359031Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8359376Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8359701Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8360051Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8360398Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8360762Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8361189Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8361531Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8361872Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8362273Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8362622Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8362967Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8363315Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8363648Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8363990Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8364338Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8364675Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8365038Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8365398Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8365737Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8366061Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8366415Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8366761Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8367126Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8367489Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8367834Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8368178Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8368527Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8368857Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8369224Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8369586Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8369930Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8370273Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8370624Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8370974Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8371338Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8371697Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8372099Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8372444Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8372792Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8373458Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8373843Z STAGE:2023-01-11 21:32:34 8840:8840 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8374210Z STAGE:2023-01-11 21:32:34 8841:8841 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8374316Z ok (5.838s) 2023-01-11T21:48:12.8374337Z 2023-01-11T21:48:12.8374619Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8374735Z Ran 1 test in 5.839s 2023-01-11T21:48:12.8374759Z 2023-01-11T21:48:12.8374835Z OK 2023-01-11T21:48:12.8374854Z 2023-01-11T21:48:12.8374984Z Generating XML reports... 2023-01-11T21:48:12.8375457Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213228.xml 2023-01-11T21:48:12.8375851Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8376040Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8376441Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8376644Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8376665Z 2023-01-11T21:48:12.8376780Z Running tests... 2023-01-11T21:48:12.8377047Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8377378Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8377684Z test_broadcast_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and Nccl backend supports CUDA allReduce (0.002s) 2023-01-11T21:48:12.8377704Z 2023-01-11T21:48:12.8377979Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8378095Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8378115Z 2023-01-11T21:48:12.8378226Z OK (skipped=1) 2023-01-11T21:48:12.8378245Z 2023-01-11T21:48:12.8378376Z Generating XML reports... 2023-01-11T21:48:12.8378847Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213237.xml 2023-01-11T21:48:12.8379239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8379405Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8379808Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8380012Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8380032Z 2023-01-11T21:48:12.8380144Z Running tests... 2023-01-11T21:48:12.8380426Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8380756Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8381036Z test_broadcast_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8381265Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8987 2023-01-11T21:48:12.8381497Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8988 2023-01-11T21:48:12.8381873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8382057Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8382544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8382744Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8383133Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8383360Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8383779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8383981Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8384219Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8384640Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8384897Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8385317Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8385560Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8385806Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8386061Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.8386315Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.8386739Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8387135Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8387485Z STAGE:2023-01-11 21:32:43 8988:8988 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8387829Z STAGE:2023-01-11 21:32:43 8987:8987 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8388128Z [1673472763.275597] [7e0e28e30a97:8987 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8388375Z [1673472764.323863] [7e0e28e30a97:8987 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8388631Z [1673472764.323863] [7e0e28e30a97:8987 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8388920Z [1673472763.296088] [7e0e28e30a97:8988 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8389167Z [1673472764.316912] [7e0e28e30a97:8988 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8389421Z [1673472764.316912] [7e0e28e30a97:8988 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8389990Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8390012Z 2023-01-11T21:48:12.8390381Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8390728Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8391070Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8391414Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8391833Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8392197Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8392545Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8392940Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8393297Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8393639Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8393974Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8394324Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8394689Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8395050Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8395390Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8395735Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8396301Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8396322Z 2023-01-11T21:48:12.8396915Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8396939Z 2023-01-11T21:48:12.8397281Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8397624Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8397955Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8398308Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8398671Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8399034Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8399376Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8399722Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8400071Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8400420Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8400789Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8401134Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8401476Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8401817Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8402165Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8402595Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8402941Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8403303Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8403689Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8404042Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8404376Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8404724Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8405087Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8405455Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8405797Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8406142Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8406494Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8406845Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8407209Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8407552Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8407891Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8408239Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8408588Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8408937Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8409302Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8409665Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8410007Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8410329Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8410678Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8411028Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8411391Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8411753Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8412098Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8412440Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8412790Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8413330Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8413676Z STAGE:2023-01-11 21:32:44 8988:8988 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8414135Z STAGE:2023-01-11 21:32:44 8987:8987 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8414240Z ok (5.837s) 2023-01-11T21:48:12.8414260Z 2023-01-11T21:48:12.8414545Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8414662Z Ran 1 test in 5.837s 2023-01-11T21:48:12.8414682Z 2023-01-11T21:48:12.8414776Z OK 2023-01-11T21:48:12.8414796Z 2023-01-11T21:48:12.8414988Z Generating XML reports... 2023-01-11T21:48:12.8415478Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213239.xml 2023-01-11T21:48:12.8415875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8416042Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8416446Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8416649Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8416669Z 2023-01-11T21:48:12.8416782Z Running tests... 2023-01-11T21:48:12.8417064Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8417394Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8417665Z test_broadcast_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8417897Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9101 2023-01-11T21:48:12.8418107Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9102 2023-01-11T21:48:12.8418498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8418682Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8419091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8419293Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8419685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8419872Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8420275Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8420477Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8420715Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8420969Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8421398Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8421815Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8422058Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8422304Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8422472Z skip: Skipped due to small world size. (4.242s) 2023-01-11T21:48:12.8422492Z 2023-01-11T21:48:12.8422778Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8422878Z Ran 1 test in 4.242s 2023-01-11T21:48:12.8422916Z 2023-01-11T21:48:12.8423011Z OK (skipped=1) 2023-01-11T21:48:12.8423030Z 2023-01-11T21:48:12.8423159Z Generating XML reports... 2023-01-11T21:48:12.8423629Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213247.xml 2023-01-11T21:48:12.8424088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8424273Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8424678Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8424928Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8424950Z 2023-01-11T21:48:12.8425067Z Running tests... 2023-01-11T21:48:12.8425335Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8425665Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8425939Z test_broadcast_multigpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8426171Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9204 2023-01-11T21:48:12.8426402Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9205 2023-01-11T21:48:12.8426799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8426983Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8427390Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8427591Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8427964Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8428147Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8428551Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8428755Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8429013Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8429268Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8429691Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8430107Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8430331Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8430571Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8431386Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1478: UserWarning: torch.distributed.broadcast_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T21:48:12.8431507Z warnings.warn( 2023-01-11T21:48:12.8432315Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1478: UserWarning: torch.distributed.broadcast_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T21:48:12.8432431Z warnings.warn( 2023-01-11T21:48:12.8432723Z [1673472779.208150] [7e0e28e30a97:9205 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8432965Z [1673472779.214275] [7e0e28e30a97:9205 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8433284Z [1673472779.214275] [7e0e28e30a97:9205 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8433570Z [1673472779.205810] [7e0e28e30a97:9204 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8433883Z [1673472779.211102] [7e0e28e30a97:9204 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8434129Z [1673472779.211102] [7e0e28e30a97:9204 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8434234Z ok (5.454s) 2023-01-11T21:48:12.8434254Z 2023-01-11T21:48:12.8434545Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8434661Z Ran 1 test in 5.454s 2023-01-11T21:48:12.8434681Z 2023-01-11T21:48:12.8434776Z OK 2023-01-11T21:48:12.8434795Z 2023-01-11T21:48:12.8434926Z Generating XML reports... 2023-01-11T21:48:12.8435400Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213254.xml 2023-01-11T21:48:12.8435792Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8435979Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8436362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8436563Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8436583Z 2023-01-11T21:48:12.8436695Z Running tests... 2023-01-11T21:48:12.8436978Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8437307Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8437584Z test_broadcast_object_list (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8438383Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82847 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.629s) 2023-01-11T21:48:12.8438405Z 2023-01-11T21:48:12.8438690Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8438808Z Ran 1 test in 1.629s 2023-01-11T21:48:12.8438827Z 2023-01-11T21:48:12.8438920Z OK (skipped=1) 2023-01-11T21:48:12.8438957Z 2023-01-11T21:48:12.8439067Z Generating XML reports... 2023-01-11T21:48:12.8439535Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213302.xml 2023-01-11T21:48:12.8439925Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8440113Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8440514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8440712Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8440732Z 2023-01-11T21:48:12.8440844Z Running tests... 2023-01-11T21:48:12.8441129Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8441443Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8441774Z test_compute_bucket_assignment_by_size_sparse_error_with_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8442564Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/85012 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.616s) 2023-01-11T21:48:12.8442652Z 2023-01-11T21:48:12.8442944Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8443064Z Ran 1 test in 1.616s 2023-01-11T21:48:12.8443083Z 2023-01-11T21:48:12.8443198Z OK (skipped=1) 2023-01-11T21:48:12.8443217Z 2023-01-11T21:48:12.8443347Z Generating XML reports... 2023-01-11T21:48:12.8443860Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213306.xml 2023-01-11T21:48:12.8444261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8446411Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8446809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8447015Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8447035Z 2023-01-11T21:48:12.8447147Z Running tests... 2023-01-11T21:48:12.8447428Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8447757Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8448096Z test_compute_bucket_assignment_by_size_sparse_error_without_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8448887Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/85339 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.647s) 2023-01-11T21:48:12.8448908Z 2023-01-11T21:48:12.8449178Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8449299Z Ran 1 test in 1.647s 2023-01-11T21:48:12.8449319Z 2023-01-11T21:48:12.8449430Z OK (skipped=1) 2023-01-11T21:48:12.8449449Z 2023-01-11T21:48:12.8449559Z Generating XML reports... 2023-01-11T21:48:12.8450026Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213310.xml 2023-01-11T21:48:12.8450420Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8450606Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8451010Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8451208Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8451228Z 2023-01-11T21:48:12.8451340Z Running tests... 2023-01-11T21:48:12.8451619Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8451935Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8452224Z test_ddp_apply_optim_in_backward (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8452452Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9420 2023-01-11T21:48:12.8452678Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9421 2023-01-11T21:48:12.8453243Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8453434Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8453845Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8454044Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8454540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8454705Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8455109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8455308Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8455627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8455892Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8456318Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8456735Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8456984Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8457225Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8458044Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T21:48:12.8458159Z warnings.warn( 2023-01-11T21:48:12.8458979Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T21:48:12.8459096Z warnings.warn( 2023-01-11T21:48:12.8459391Z [1673472799.666766] [7e0e28e30a97:9420 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8459636Z [1673472799.673402] [7e0e28e30a97:9420 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8459890Z [1673472799.673402] [7e0e28e30a97:9420 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8460183Z [1673472799.667214] [7e0e28e30a97:9421 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8460427Z [1673472799.673704] [7e0e28e30a97:9421 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8460678Z [1673472799.673704] [7e0e28e30a97:9421 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8460930Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8461161Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8461406Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8461652Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8461901Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8462145Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8462388Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8462629Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8462732Z ok (7.318s) 2023-01-11T21:48:12.8462752Z 2023-01-11T21:48:12.8463088Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8463206Z Ran 1 test in 7.318s 2023-01-11T21:48:12.8463227Z 2023-01-11T21:48:12.8463320Z OK 2023-01-11T21:48:12.8463340Z 2023-01-11T21:48:12.8463470Z Generating XML reports... 2023-01-11T21:48:12.8463939Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213315.xml 2023-01-11T21:48:12.8464378Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8464570Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8464980Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8465180Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8465201Z 2023-01-11T21:48:12.8465295Z Running tests... 2023-01-11T21:48:12.8465580Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8465909Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8466235Z test_ddp_apply_optim_in_backward_grad_as_bucket_view_false (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8466466Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9538 2023-01-11T21:48:12.8466697Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9539 2023-01-11T21:48:12.8467088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8467273Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8467656Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8467855Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8468247Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8468432Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8468834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8469034Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8469291Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8469547Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8469965Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8470364Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8470608Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8470848Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8471678Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T21:48:12.8471794Z warnings.warn( 2023-01-11T21:48:12.8472617Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T21:48:12.8472791Z warnings.warn( 2023-01-11T21:48:12.8473086Z [1673472809.541941] [7e0e28e30a97:9538 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8473330Z [1673472809.548183] [7e0e28e30a97:9538 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8473633Z [1673472809.548183] [7e0e28e30a97:9538 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8473930Z [1673472809.543343] [7e0e28e30a97:9539 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8474153Z [1673472809.548827] [7e0e28e30a97:9539 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8474408Z [1673472809.548827] [7e0e28e30a97:9539 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8474661Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8474908Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8475156Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8475406Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8475509Z ok (6.537s) 2023-01-11T21:48:12.8475529Z 2023-01-11T21:48:12.8475822Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8475920Z Ran 1 test in 6.537s 2023-01-11T21:48:12.8475958Z 2023-01-11T21:48:12.8476033Z OK 2023-01-11T21:48:12.8476052Z 2023-01-11T21:48:12.8476180Z Generating XML reports... 2023-01-11T21:48:12.8476648Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213324.xml 2023-01-11T21:48:12.8477045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8477228Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8477632Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8477834Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8477854Z 2023-01-11T21:48:12.8477966Z Running tests... 2023-01-11T21:48:12.8478230Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8478559Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8478869Z test_ddp_apply_optim_in_backward_ignored_params (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8479101Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9656 2023-01-11T21:48:12.8479331Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9657 2023-01-11T21:48:12.8479720Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8479904Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8480296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8480483Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8480868Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8481067Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8481466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8481727Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8481984Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8482242Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8482716Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8483144Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8483368Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8483610Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8484434Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T21:48:12.8484555Z warnings.warn( 2023-01-11T21:48:12.8485378Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T21:48:12.8485495Z warnings.warn( 2023-01-11T21:48:12.8485789Z [1673472818.616661] [7e0e28e30a97:9657 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8486036Z [1673472818.622644] [7e0e28e30a97:9657 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8486291Z [1673472818.622644] [7e0e28e30a97:9657 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8486578Z [1673472818.616490] [7e0e28e30a97:9656 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8486823Z [1673472818.622678] [7e0e28e30a97:9656 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8487059Z [1673472818.622678] [7e0e28e30a97:9656 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8487165Z ok (6.635s) 2023-01-11T21:48:12.8487185Z 2023-01-11T21:48:12.8487470Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8487585Z Ran 1 test in 6.635s 2023-01-11T21:48:12.8487605Z 2023-01-11T21:48:12.8487698Z OK 2023-01-11T21:48:12.8487721Z 2023-01-11T21:48:12.8487849Z Generating XML reports... 2023-01-11T21:48:12.8488316Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213334.xml 2023-01-11T21:48:12.8488704Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8488888Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8489277Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8489478Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8489499Z 2023-01-11T21:48:12.8489608Z Running tests... 2023-01-11T21:48:12.8489890Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8490221Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8490560Z test_ddp_broadcast_buffer (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8490790Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9776 2023-01-11T21:48:12.8491020Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9777 2023-01-11T21:48:12.8491400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8491631Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8492049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8492250Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8492637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8492821Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8493421Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8493623Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8493880Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8494122Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8494542Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8494957Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8495201Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8495442Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8495740Z [1673472827.804636] [7e0e28e30a97:9777 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8495984Z [1673472827.809594] [7e0e28e30a97:9777 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8496238Z [1673472827.809594] [7e0e28e30a97:9777 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8496533Z [1673472827.801516] [7e0e28e30a97:9776 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8496776Z [1673472827.807350] [7e0e28e30a97:9776 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8497010Z [1673472827.807350] [7e0e28e30a97:9776 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8497262Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8497507Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8497611Z ok (5.924s) 2023-01-11T21:48:12.8497631Z 2023-01-11T21:48:12.8497920Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8498040Z Ran 1 test in 5.924s 2023-01-11T21:48:12.8498061Z 2023-01-11T21:48:12.8498155Z OK 2023-01-11T21:48:12.8498174Z 2023-01-11T21:48:12.8498302Z Generating XML reports... 2023-01-11T21:48:12.8498750Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213343.xml 2023-01-11T21:48:12.8499141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8499326Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8499827Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8500028Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8500048Z 2023-01-11T21:48:12.8500160Z Running tests... 2023-01-11T21:48:12.8500444Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8500832Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8501129Z test_ddp_broadcast_buffer_via_hook (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8501341Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9894 2023-01-11T21:48:12.8501572Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9895 2023-01-11T21:48:12.8501975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8502168Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8502570Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8502773Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8503170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8503353Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8503736Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8503936Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8504192Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8504451Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8504871Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8505286Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8505530Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8505773Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8506067Z [1673472836.256817] [7e0e28e30a97:9895 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8506311Z [1673472836.262903] [7e0e28e30a97:9895 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8506552Z [1673472836.262903] [7e0e28e30a97:9895 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8506841Z [1673472836.255868] [7e0e28e30a97:9894 :0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8507086Z [1673472836.263227] [7e0e28e30a97:9894 :0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8507344Z [1673472836.263227] [7e0e28e30a97:9894 :0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8507592Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8507840Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8508082Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8508401Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8508505Z ok (5.933s) 2023-01-11T21:48:12.8508525Z 2023-01-11T21:48:12.8508799Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8508917Z Ran 1 test in 5.933s 2023-01-11T21:48:12.8508937Z 2023-01-11T21:48:12.8509030Z OK 2023-01-11T21:48:12.8509049Z 2023-01-11T21:48:12.8509177Z Generating XML reports... 2023-01-11T21:48:12.8509696Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213351.xml 2023-01-11T21:48:12.8510102Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8510287Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8510688Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8510875Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8510914Z 2023-01-11T21:48:12.8511008Z Running tests... 2023-01-11T21:48:12.8511288Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8511616Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8511905Z test_ddp_buffer_hook_allreduce (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8512703Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78641 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.640s) 2023-01-11T21:48:12.8512725Z 2023-01-11T21:48:12.8513003Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8513124Z Ran 1 test in 1.640s 2023-01-11T21:48:12.8513143Z 2023-01-11T21:48:12.8513254Z OK (skipped=1) 2023-01-11T21:48:12.8513273Z 2023-01-11T21:48:12.8513400Z Generating XML reports... 2023-01-11T21:48:12.8513851Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213400.xml 2023-01-11T21:48:12.8514246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8514438Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8514840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8515038Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8515058Z 2023-01-11T21:48:12.8515170Z Running tests... 2023-01-11T21:48:12.8515452Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8515788Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8516073Z test_ddp_buffer_hook_allreduce_return_future (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8516865Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77261 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.622s) 2023-01-11T21:48:12.8516905Z 2023-01-11T21:48:12.8517167Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8517285Z Ran 1 test in 1.622s 2023-01-11T21:48:12.8517305Z 2023-01-11T21:48:12.8517416Z OK (skipped=1) 2023-01-11T21:48:12.8517435Z 2023-01-11T21:48:12.8517563Z Generating XML reports... 2023-01-11T21:48:12.8518030Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213404.xml 2023-01-11T21:48:12.8518489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8518674Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8519078Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8519316Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8519356Z 2023-01-11T21:48:12.8519458Z Running tests... 2023-01-11T21:48:12.8519745Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8520076Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8520380Z test_ddp_build_debug_param_to_name_mapping (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8520617Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10080 2023-01-11T21:48:12.8520847Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10081 2023-01-11T21:48:12.8521238Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8521420Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8521809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8522010Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8522400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8522587Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8522985Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8523188Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8523444Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8523701Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8524105Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8524525Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8524766Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8525007Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8525223Z 2023-01-11T21:48:12.8525522Z [1673472853.011557] [7e0e28e30a97:10080:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8525764Z [1673472853.018246] [7e0e28e30a97:10080:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8526016Z [1673472853.018246] [7e0e28e30a97:10080:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8526312Z [1673472853.020177] [7e0e28e30a97:10081:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8526557Z [1673472853.026278] [7e0e28e30a97:10081:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8526792Z [1673472853.026278] [7e0e28e30a97:10081:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8526957Z ok (5.429s) 2023-01-11T21:48:12.8526978Z 2023-01-11T21:48:12.8527274Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8527391Z Ran 1 test in 5.429s 2023-01-11T21:48:12.8527411Z 2023-01-11T21:48:12.8527506Z OK 2023-01-11T21:48:12.8527525Z 2023-01-11T21:48:12.8527655Z Generating XML reports... 2023-01-11T21:48:12.8528173Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213408.xml 2023-01-11T21:48:12.8528577Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8528764Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8529151Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8529351Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8529375Z 2023-01-11T21:48:12.8529488Z Running tests... 2023-01-11T21:48:12.8529770Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8530105Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8530430Z test_ddp_build_debug_param_to_name_mapping_requires_grad (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8530663Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10194 2023-01-11T21:48:12.8530892Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10195 2023-01-11T21:48:12.8531267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8531452Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8531853Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8532056Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8532448Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8532631Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8533187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8533392Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8533650Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8533890Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8534314Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8534737Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8534978Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8535218Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8535518Z [1673472861.030469] [7e0e28e30a97:10194:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8535759Z [1673472861.037105] [7e0e28e30a97:10194:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8536010Z [1673472861.037105] [7e0e28e30a97:10194:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8536300Z [1673472861.031440] [7e0e28e30a97:10195:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8536625Z [1673472861.036655] [7e0e28e30a97:10195:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8536884Z [1673472861.036655] [7e0e28e30a97:10195:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8536993Z ok (5.423s) 2023-01-11T21:48:12.8537013Z 2023-01-11T21:48:12.8537371Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8537497Z Ran 1 test in 5.423s 2023-01-11T21:48:12.8537517Z 2023-01-11T21:48:12.8537612Z OK 2023-01-11T21:48:12.8537632Z 2023-01-11T21:48:12.8537760Z Generating XML reports... 2023-01-11T21:48:12.8538237Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213416.xml 2023-01-11T21:48:12.8538627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8538800Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8539204Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8539403Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8539424Z 2023-01-11T21:48:12.8539536Z Running tests... 2023-01-11T21:48:12.8539822Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8540153Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8540429Z test_ddp_comm_hook_logging (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8540661Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10308 2023-01-11T21:48:12.8540872Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10309 2023-01-11T21:48:12.8541270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8541452Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8541857Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8542057Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8542452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8542637Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8543036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8543236Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8543473Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8543732Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8544153Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8544573Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8544816Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8545056Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8545304Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8545551Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8545842Z [1673472868.931142] [7e0e28e30a97:10309:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8546131Z [1673472868.937248] [7e0e28e30a97:10309:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8546384Z [1673472868.937248] [7e0e28e30a97:10309:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8546717Z [1673472868.924233] [7e0e28e30a97:10308:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8546967Z [1673472868.931228] [7e0e28e30a97:10308:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8547220Z [1673472868.931228] [7e0e28e30a97:10308:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8547327Z ok (5.931s) 2023-01-11T21:48:12.8547350Z 2023-01-11T21:48:12.8547645Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8547763Z Ran 1 test in 5.931s 2023-01-11T21:48:12.8547783Z 2023-01-11T21:48:12.8547878Z OK 2023-01-11T21:48:12.8547898Z 2023-01-11T21:48:12.8548008Z Generating XML reports... 2023-01-11T21:48:12.8548477Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213424.xml 2023-01-11T21:48:12.8548867Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8549049Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8549450Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8549650Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8549670Z 2023-01-11T21:48:12.8549786Z Running tests... 2023-01-11T21:48:12.8550067Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8550394Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8550678Z test_ddp_control_flow_different_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8550908Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10426 2023-01-11T21:48:12.8551142Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10427 2023-01-11T21:48:12.8551537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8551722Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8552126Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8552329Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8552717Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8552899Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8553283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8553512Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8553771Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8554027Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8554446Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8554861Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8555164Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8555403Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8555737Z [1673472877.432170] [7e0e28e30a97:10427:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8555970Z [1673472877.437644] [7e0e28e30a97:10427:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8556223Z [1673472877.437644] [7e0e28e30a97:10427:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8556512Z [1673472877.428836] [7e0e28e30a97:10426:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8556761Z [1673472877.435428] [7e0e28e30a97:10426:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8557016Z [1673472877.435428] [7e0e28e30a97:10426:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8557864Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:48:12.8557972Z ok (5.933s) 2023-01-11T21:48:12.8557992Z 2023-01-11T21:48:12.8558283Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8558405Z Ran 1 test in 5.933s 2023-01-11T21:48:12.8558424Z 2023-01-11T21:48:12.8558519Z OK 2023-01-11T21:48:12.8558538Z 2023-01-11T21:48:12.8558665Z Generating XML reports... 2023-01-11T21:48:12.8559119Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213432.xml 2023-01-11T21:48:12.8559511Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8559697Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8560100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8560302Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8560321Z 2023-01-11T21:48:12.8560432Z Running tests... 2023-01-11T21:48:12.8560712Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8561047Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8561324Z test_ddp_control_flow_same_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8562119Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78235 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.643s) 2023-01-11T21:48:12.8562160Z 2023-01-11T21:48:12.8562422Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8562539Z Ran 1 test in 1.643s 2023-01-11T21:48:12.8562558Z 2023-01-11T21:48:12.8562672Z OK (skipped=1) 2023-01-11T21:48:12.8562692Z 2023-01-11T21:48:12.8562821Z Generating XML reports... 2023-01-11T21:48:12.8563291Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213441.xml 2023-01-11T21:48:12.8563746Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8563929Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8564333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8564561Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8564601Z 2023-01-11T21:48:12.8564699Z Running tests... 2023-01-11T21:48:12.8564986Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8565314Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8565584Z test_ddp_create_graph (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8565822Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10578 2023-01-11T21:48:12.8566055Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10579 2023-01-11T21:48:12.8566449Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8566635Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8567022Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8567222Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8567613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8567802Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8568205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8568410Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8568666Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8568922Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8569326Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8569744Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8569988Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8570230Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8570522Z [1673472889.331940] [7e0e28e30a97:10578:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8570771Z [1673472890.110640] [7e0e28e30a97:10578:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8571024Z [1673472890.110640] [7e0e28e30a97:10578:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8571974Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:48:12.8572268Z [1673472889.353361] [7e0e28e30a97:10579:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8572602Z [1673472890.114198] [7e0e28e30a97:10579:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8572993Z [1673472890.114198] [7e0e28e30a97:10579:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8574040Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:48:12.8575288Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/engine.cpp:1134.) 2023-01-11T21:48:12.8575537Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T21:48:12.8576762Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/engine.cpp:1134.) 2023-01-11T21:48:12.8577003Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T21:48:12.8577237Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8577483Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8578431Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:48:12.8579362Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:48:12.8580300Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:48:12.8581240Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:48:12.8582175Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:48:12.8583275Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:48:12.8584228Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:48:12.8585170Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:48:12.8586109Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:48:12.8587040Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T21:48:12.8587152Z ok (5.432s) 2023-01-11T21:48:12.8587173Z 2023-01-11T21:48:12.8587455Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8587571Z Ran 1 test in 5.432s 2023-01-11T21:48:12.8587591Z 2023-01-11T21:48:12.8587686Z OK 2023-01-11T21:48:12.8587705Z 2023-01-11T21:48:12.8587836Z Generating XML reports... 2023-01-11T21:48:12.8588305Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213445.xml 2023-01-11T21:48:12.8588697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8588882Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8589266Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8589470Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8589489Z 2023-01-11T21:48:12.8589600Z Running tests... 2023-01-11T21:48:12.8589884Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8590213Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8590476Z test_ddp_device (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8591264Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77324 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.607s) 2023-01-11T21:48:12.8591286Z 2023-01-11T21:48:12.8591565Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8591742Z Ran 1 test in 1.607s 2023-01-11T21:48:12.8591762Z 2023-01-11T21:48:12.8591873Z OK (skipped=1) 2023-01-11T21:48:12.8591893Z 2023-01-11T21:48:12.8592004Z Generating XML reports... 2023-01-11T21:48:12.8592483Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213453.xml 2023-01-11T21:48:12.8592922Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8593113Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8593524Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8593728Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8593748Z 2023-01-11T21:48:12.8593860Z Running tests... 2023-01-11T21:48:12.8594142Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8594456Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8594739Z test_ddp_forward_backward_hook (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8594969Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10726 2023-01-11T21:48:12.8595201Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10727 2023-01-11T21:48:12.8595596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8595781Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8596185Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8596385Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8596773Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8596943Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8597343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8597542Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8597804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8598062Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8598484Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8598900Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8599146Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8599370Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8600203Z /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1331: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior. 2023-01-11T21:48:12.8600553Z warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes " 2023-01-11T21:48:12.8601382Z /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1331: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior. 2023-01-11T21:48:12.8601802Z warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes " 2023-01-11T21:48:12.8602097Z [1673472902.158266] [7e0e28e30a97:10726:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8602390Z [1673472902.164465] [7e0e28e30a97:10726:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8602650Z [1673472902.164465] [7e0e28e30a97:10726:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8602940Z [1673472902.165102] [7e0e28e30a97:10727:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8603182Z [1673472902.171785] [7e0e28e30a97:10727:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8603435Z [1673472902.171785] [7e0e28e30a97:10727:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8603541Z ok (5.938s) 2023-01-11T21:48:12.8603561Z 2023-01-11T21:48:12.8603832Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8603951Z Ran 1 test in 5.938s 2023-01-11T21:48:12.8603971Z 2023-01-11T21:48:12.8604064Z OK 2023-01-11T21:48:12.8604083Z 2023-01-11T21:48:12.8604217Z Generating XML reports... 2023-01-11T21:48:12.8604688Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213457.xml 2023-01-11T21:48:12.8605081Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8605267Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8605668Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8605871Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8605891Z 2023-01-11T21:48:12.8605986Z Running tests... 2023-01-11T21:48:12.8606266Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8606596Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8606882Z test_ddp_grad_div_uneven_inputs (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8607668Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78685 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.631s) 2023-01-11T21:48:12.8607689Z 2023-01-11T21:48:12.8607970Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8608091Z Ran 1 test in 1.631s 2023-01-11T21:48:12.8608111Z 2023-01-11T21:48:12.8608223Z OK (skipped=1) 2023-01-11T21:48:12.8608243Z 2023-01-11T21:48:12.8608372Z Generating XML reports... 2023-01-11T21:48:12.8608822Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213506.xml 2023-01-11T21:48:12.8609216Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8609401Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8609804Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8610006Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8610025Z 2023-01-11T21:48:12.8610139Z Running tests... 2023-01-11T21:48:12.8610419Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8610814Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8611100Z test_ddp_hook_parity_allreduce (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8611916Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77293 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.624s) 2023-01-11T21:48:12.8611958Z 2023-01-11T21:48:12.8612230Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8612348Z Ran 1 test in 1.624s 2023-01-11T21:48:12.8612368Z 2023-01-11T21:48:12.8612481Z OK (skipped=1) 2023-01-11T21:48:12.8612500Z 2023-01-11T21:48:12.8612627Z Generating XML reports... 2023-01-11T21:48:12.8613232Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213510.xml 2023-01-11T21:48:12.8613641Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8613827Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8614233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8614437Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8614457Z 2023-01-11T21:48:12.8614552Z Running tests... 2023-01-11T21:48:12.8614834Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8615162Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8615469Z test_ddp_hook_parity_allreduce_process_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8615703Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10942 2023-01-11T21:48:12.8615934Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10943 2023-01-11T21:48:12.8616324Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8616508Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8616897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8617099Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8617491Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8617677Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8618074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8618279Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8618533Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8618788Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8619213Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8619614Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8619856Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8620111Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.8620441Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8620695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.8621121Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8621604Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8621905Z [1673472918.982108] [7e0e28e30a97:10942:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8622151Z [1673472918.988108] [7e0e28e30a97:10942:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8622387Z [1673472918.988108] [7e0e28e30a97:10942:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8622681Z [1673472918.990381] [7e0e28e30a97:10943:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8622925Z [1673472918.996223] [7e0e28e30a97:10943:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8623180Z [1673472918.996223] [7e0e28e30a97:10943:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8623427Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8623674Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8623921Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8624168Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8624280Z ok (6.238s) 2023-01-11T21:48:12.8624300Z 2023-01-11T21:48:12.8624593Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8624692Z Ran 1 test in 6.238s 2023-01-11T21:48:12.8624712Z 2023-01-11T21:48:12.8624807Z OK 2023-01-11T21:48:12.8624826Z 2023-01-11T21:48:12.8624955Z Generating XML reports... 2023-01-11T21:48:12.8625425Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213514.xml 2023-01-11T21:48:12.8625820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8626006Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8626407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8626605Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8626628Z 2023-01-11T21:48:12.8626723Z Running tests... 2023-01-11T21:48:12.8627004Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8627333Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8627621Z test_ddp_hook_parity_post_localSGD (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8627856Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11060 2023-01-11T21:48:12.8628088Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11061 2023-01-11T21:48:12.8628480Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8628698Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8629087Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8629354Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8629754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8629937Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8630340Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8630591Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8630855Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8631110Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8631535Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8631934Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8632181Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8632471Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T21:48:12.8632714Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8633005Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T21:48:12.8633298Z [1673472927.758782] [7e0e28e30a97:11061:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8633543Z [1673472927.763734] [7e0e28e30a97:11061:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8633797Z [1673472927.763734] [7e0e28e30a97:11061:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8634089Z [1673472927.754532] [7e0e28e30a97:11060:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8634333Z [1673472927.760432] [7e0e28e30a97:11060:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8634571Z [1673472927.760432] [7e0e28e30a97:11060:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8634817Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8635063Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8635310Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8635552Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8635847Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T21:48:12.8636128Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T21:48:12.8636419Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T21:48:12.8636706Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T21:48:12.8636931Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8637177Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8637420Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8637740Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8638032Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T21:48:12.8638320Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T21:48:12.8638651Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 1000 iterations 2023-01-11T21:48:12.8638946Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 1000 iterations 2023-01-11T21:48:12.8639192Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8639420Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8639662Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8639909Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8640013Z ok (6.639s) 2023-01-11T21:48:12.8640033Z 2023-01-11T21:48:12.8640331Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8640449Z Ran 1 test in 6.640s 2023-01-11T21:48:12.8640468Z 2023-01-11T21:48:12.8640561Z OK 2023-01-11T21:48:12.8640581Z 2023-01-11T21:48:12.8640710Z Generating XML reports... 2023-01-11T21:48:12.8641168Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213523.xml 2023-01-11T21:48:12.8641561Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8641743Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8642147Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8642351Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8642371Z 2023-01-11T21:48:12.8642483Z Running tests... 2023-01-11T21:48:12.8642765Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8643092Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8643376Z test_ddp_hook_parity_powerSGD (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8644149Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77378 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.608s) 2023-01-11T21:48:12.8644190Z 2023-01-11T21:48:12.8644451Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8644574Z Ran 1 test in 1.608s 2023-01-11T21:48:12.8644593Z 2023-01-11T21:48:12.8644704Z OK (skipped=1) 2023-01-11T21:48:12.8644724Z 2023-01-11T21:48:12.8644854Z Generating XML reports... 2023-01-11T21:48:12.8645323Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213532.xml 2023-01-11T21:48:12.8645716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8645905Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8646308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8646510Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8646527Z 2023-01-11T21:48:12.8646623Z Running tests... 2023-01-11T21:48:12.8646903Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8647298Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8647585Z test_ddp_hook_pickling_powerSGD (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8647817Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11212 2023-01-11T21:48:12.8648047Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11213 2023-01-11T21:48:12.8648486Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8648679Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8649071Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8649270Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8649661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8649850Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8650253Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8650452Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8650710Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8650965Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8651382Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8651782Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8652022Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8652608Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 4; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T21:48:12.8653087Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8653687Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 4; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T21:48:12.8653950Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8654201Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8654487Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Start to apply PowerSGD after 4 iterations. 2023-01-11T21:48:12.8654773Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Start to apply PowerSGD after 4 iterations. 2023-01-11T21:48:12.8655091Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:A zero tensor of length 10 that represents local error is created. 2023-01-11T21:48:12.8655405Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:A zero tensor of length 10 that represents local error is created. 2023-01-11T21:48:12.8655733Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Compression stats: iter 4, total before compression 10, total after compression 10, rate 1.0 2023-01-11T21:48:12.8656075Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Allocating contiguous memory of length 0 for Ps, and of length 0 for Qs, respectively. 2023-01-11T21:48:12.8656509Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Compression stats: iter 4, total before compression 10, total after compression 10, rate 1.0 2023-01-11T21:48:12.8656850Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Allocating contiguous memory of length 0 for Ps, and of length 0 for Qs, respectively. 2023-01-11T21:48:12.8657191Z [1673472941.049565] [7e0e28e30a97:11212:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8657446Z [1673472941.055372] [7e0e28e30a97:11212:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8657700Z [1673472941.055372] [7e0e28e30a97:11212:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8657990Z [1673472941.056615] [7e0e28e30a97:11213:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8658239Z [1673472941.062957] [7e0e28e30a97:11213:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8658492Z [1673472941.062957] [7e0e28e30a97:11213:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8658744Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8658971Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8659075Z ok (6.032s) 2023-01-11T21:48:12.8659094Z 2023-01-11T21:48:12.8659393Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8659512Z Ran 1 test in 6.032s 2023-01-11T21:48:12.8659532Z 2023-01-11T21:48:12.8659628Z OK 2023-01-11T21:48:12.8659647Z 2023-01-11T21:48:12.8659777Z Generating XML reports... 2023-01-11T21:48:12.8660252Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213536.xml 2023-01-11T21:48:12.8660648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8660816Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8661223Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8661424Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8661444Z 2023-01-11T21:48:12.8661556Z Running tests... 2023-01-11T21:48:12.8661837Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8662163Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8662584Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:48:12.8662608Z 2023-01-11T21:48:12.8662891Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8663006Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8663025Z 2023-01-11T21:48:12.8663119Z OK (skipped=1) 2023-01-11T21:48:12.8663138Z 2023-01-11T21:48:12.8663266Z Generating XML reports... 2023-01-11T21:48:12.8663738Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213545.xml 2023-01-11T21:48:12.8664134Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8664320Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8664721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8664981Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8665002Z 2023-01-11T21:48:12.8665114Z Running tests... 2023-01-11T21:48:12.8665399Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8665710Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8666184Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:48:12.8666207Z 2023-01-11T21:48:12.8666500Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8666618Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8666638Z 2023-01-11T21:48:12.8666750Z OK (skipped=1) 2023-01-11T21:48:12.8666769Z 2023-01-11T21:48:12.8666897Z Generating XML reports... 2023-01-11T21:48:12.8667369Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213547.xml 2023-01-11T21:48:12.8667764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8667948Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8668336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8668537Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8668556Z 2023-01-11T21:48:12.8668667Z Running tests... 2023-01-11T21:48:12.8668947Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8669275Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8669755Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:48:12.8669779Z 2023-01-11T21:48:12.8670057Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8670175Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8670194Z 2023-01-11T21:48:12.8670306Z OK (skipped=1) 2023-01-11T21:48:12.8670325Z 2023-01-11T21:48:12.8670435Z Generating XML reports... 2023-01-11T21:48:12.8670908Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213549.xml 2023-01-11T21:48:12.8671299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8671484Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8671886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8672091Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8672110Z 2023-01-11T21:48:12.8672222Z Running tests... 2023-01-11T21:48:12.8672505Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8672832Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8673295Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:48:12.8673333Z 2023-01-11T21:48:12.8673595Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8673712Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8673732Z 2023-01-11T21:48:12.8673843Z OK (skipped=1) 2023-01-11T21:48:12.8673862Z 2023-01-11T21:48:12.8673990Z Generating XML reports... 2023-01-11T21:48:12.8674529Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213552.xml 2023-01-11T21:48:12.8674923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8675109Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8675560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8675748Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8675787Z 2023-01-11T21:48:12.8675881Z Running tests... 2023-01-11T21:48:12.8676167Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8676504Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8676982Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:48:12.8677007Z 2023-01-11T21:48:12.8677286Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8677404Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8677423Z 2023-01-11T21:48:12.8677534Z OK (skipped=1) 2023-01-11T21:48:12.8677553Z 2023-01-11T21:48:12.8677683Z Generating XML reports... 2023-01-11T21:48:12.8678134Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213554.xml 2023-01-11T21:48:12.8678524Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8678708Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8679110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8679311Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8679331Z 2023-01-11T21:48:12.8679442Z Running tests... 2023-01-11T21:48:12.8679722Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8680050Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8680527Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:48:12.8680548Z 2023-01-11T21:48:12.8680830Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8680929Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8680948Z 2023-01-11T21:48:12.8681059Z OK (skipped=1) 2023-01-11T21:48:12.8681082Z 2023-01-11T21:48:12.8681210Z Generating XML reports... 2023-01-11T21:48:12.8681680Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213556.xml 2023-01-11T21:48:12.8682074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8682259Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8682663Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8682862Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8682882Z 2023-01-11T21:48:12.8682993Z Running tests... 2023-01-11T21:48:12.8683258Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8683588Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8684125Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:48:12.8684146Z 2023-01-11T21:48:12.8684430Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8684547Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8684567Z 2023-01-11T21:48:12.8684681Z OK (skipped=1) 2023-01-11T21:48:12.8684749Z 2023-01-11T21:48:12.8684886Z Generating XML reports... 2023-01-11T21:48:12.8685363Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213559.xml 2023-01-11T21:48:12.8685755Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8685920Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8686320Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8686526Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8686546Z 2023-01-11T21:48:12.8686656Z Running tests... 2023-01-11T21:48:12.8686938Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8687266Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8687745Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:48:12.8687765Z 2023-01-11T21:48:12.8688045Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8688162Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8688181Z 2023-01-11T21:48:12.8688275Z OK (skipped=1) 2023-01-11T21:48:12.8688319Z 2023-01-11T21:48:12.8688431Z Generating XML reports... 2023-01-11T21:48:12.8688897Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213601.xml 2023-01-11T21:48:12.8689291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8689474Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8689880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8690080Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8690100Z 2023-01-11T21:48:12.8690210Z Running tests... 2023-01-11T21:48:12.8690490Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8690803Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8691282Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:48:12.8691302Z 2023-01-11T21:48:12.8691581Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8691696Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8691717Z 2023-01-11T21:48:12.8691831Z OK (skipped=1) 2023-01-11T21:48:12.8691850Z 2023-01-11T21:48:12.8691980Z Generating XML reports... 2023-01-11T21:48:12.8692448Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213604.xml 2023-01-11T21:48:12.8692840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8693215Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8693701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8693904Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8693925Z 2023-01-11T21:48:12.8694038Z Running tests... 2023-01-11T21:48:12.8694318Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8694712Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8695199Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:48:12.8695220Z 2023-01-11T21:48:12.8695506Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8695623Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8695647Z 2023-01-11T21:48:12.8695757Z OK (skipped=1) 2023-01-11T21:48:12.8695777Z 2023-01-11T21:48:12.8695887Z Generating XML reports... 2023-01-11T21:48:12.8696356Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213606.xml 2023-01-11T21:48:12.8696748Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8696935Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8697339Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8697539Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8697559Z 2023-01-11T21:48:12.8697672Z Running tests... 2023-01-11T21:48:12.8697950Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8698277Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8698680Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:48:12.8698719Z 2023-01-11T21:48:12.8698977Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8699095Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8699114Z 2023-01-11T21:48:12.8699228Z OK (skipped=1) 2023-01-11T21:48:12.8699248Z 2023-01-11T21:48:12.8699375Z Generating XML reports... 2023-01-11T21:48:12.8699849Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213608.xml 2023-01-11T21:48:12.8700244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8700431Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8700837Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8701018Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8701055Z 2023-01-11T21:48:12.8701148Z Running tests... 2023-01-11T21:48:12.8701427Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8701758Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8702173Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T21:48:12.8702194Z 2023-01-11T21:48:12.8702472Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8702588Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8702607Z 2023-01-11T21:48:12.8702718Z OK (skipped=1) 2023-01-11T21:48:12.8702809Z 2023-01-11T21:48:12.8702944Z Generating XML reports... 2023-01-11T21:48:12.8703404Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213611.xml 2023-01-11T21:48:12.8703796Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8703980Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8704428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8704635Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8704655Z 2023-01-11T21:48:12.8704767Z Running tests... 2023-01-11T21:48:12.8705058Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8705389Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8705672Z test_ddp_ignore_params_arg (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8706444Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77325 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.624s) 2023-01-11T21:48:12.8706482Z 2023-01-11T21:48:12.8706748Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8706866Z Ran 1 test in 1.624s 2023-01-11T21:48:12.8706885Z 2023-01-11T21:48:12.8706994Z OK (skipped=1) 2023-01-11T21:48:12.8707013Z 2023-01-11T21:48:12.8707143Z Generating XML reports... 2023-01-11T21:48:12.8707609Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213613.xml 2023-01-11T21:48:12.8708004Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8708192Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8708596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8708793Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8708813Z 2023-01-11T21:48:12.8708911Z Running tests... 2023-01-11T21:48:12.8709194Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8709521Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8709787Z test_ddp_inference (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8710019Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11760 2023-01-11T21:48:12.8710251Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11761 2023-01-11T21:48:12.8710646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8710833Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8711221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8711424Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8711820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8712004Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8712402Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8712600Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8712918Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8713172Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8713596Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8714043Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8714294Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8714538Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8714830Z [1673472982.407789] [7e0e28e30a97:11760:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8715080Z [1673472982.413397] [7e0e28e30a97:11760:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8715332Z [1673472982.413397] [7e0e28e30a97:11760:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8715624Z [1673472982.412044] [7e0e28e30a97:11761:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8715872Z [1673472982.417576] [7e0e28e30a97:11761:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8716124Z [1673472982.417576] [7e0e28e30a97:11761:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8716212Z ok (6.122s) 2023-01-11T21:48:12.8716250Z 2023-01-11T21:48:12.8716522Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8716642Z Ran 1 test in 6.122s 2023-01-11T21:48:12.8716665Z 2023-01-11T21:48:12.8716758Z OK 2023-01-11T21:48:12.8716778Z 2023-01-11T21:48:12.8716907Z Generating XML reports... 2023-01-11T21:48:12.8717374Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213617.xml 2023-01-11T21:48:12.8717764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8717953Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8718355Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8718537Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8718573Z 2023-01-11T21:48:12.8718668Z Running tests... 2023-01-11T21:48:12.8718953Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8719281Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8719570Z test_ddp_join_model_equivalence (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8719803Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11874 2023-01-11T21:48:12.8720035Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11875 2023-01-11T21:48:12.8720433Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8720600Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8721005Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8721205Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8721594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8721844Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8722252Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8722451Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8722707Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8723035Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8723450Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8723866Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8724105Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8724352Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8724598Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8724844Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8725140Z [1673472991.573907] [7e0e28e30a97:11875:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8725383Z [1673472991.580700] [7e0e28e30a97:11875:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8725635Z [1673472991.580700] [7e0e28e30a97:11875:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8725924Z [1673472991.567427] [7e0e28e30a97:11874:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8726152Z [1673472991.574946] [7e0e28e30a97:11874:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8726405Z [1673472991.574946] [7e0e28e30a97:11874:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8726831Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T21:48:12.8727004Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T21:48:12.8727424Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T21:48:12.8727596Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T21:48:12.8727702Z ok (5.940s) 2023-01-11T21:48:12.8727722Z 2023-01-11T21:48:12.8728005Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8728108Z Ran 1 test in 5.940s 2023-01-11T21:48:12.8728145Z 2023-01-11T21:48:12.8728220Z OK 2023-01-11T21:48:12.8728239Z 2023-01-11T21:48:12.8728368Z Generating XML reports... 2023-01-11T21:48:12.8728840Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213626.xml 2023-01-11T21:48:12.8729232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8729420Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8729823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8730021Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8730041Z 2023-01-11T21:48:12.8730153Z Running tests... 2023-01-11T21:48:12.8730418Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8730814Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8731089Z test_ddp_logging_data_cpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8731325Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11992 2023-01-11T21:48:12.8731560Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11993 2023-01-11T21:48:12.8731999Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8732192Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8732607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8732788Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8733338Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8733530Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8733934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8734133Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8734393Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8734650Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8735068Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8735484Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8735708Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8735952Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8736197Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8736443Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8736741Z [1673472998.988386] [7e0e28e30a97:11993:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8736985Z [1673472999.792536] [7e0e28e30a97:11993:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8737241Z [1673472999.792536] [7e0e28e30a97:11993:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8737530Z [1673472998.968092] [7e0e28e30a97:11992:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8737777Z [1673472999.773382] [7e0e28e30a97:11992:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8738027Z [1673472999.773382] [7e0e28e30a97:11992:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8738114Z ok (5.648s) 2023-01-11T21:48:12.8738135Z 2023-01-11T21:48:12.8738427Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8738545Z Ran 1 test in 5.648s 2023-01-11T21:48:12.8738565Z 2023-01-11T21:48:12.8738660Z OK 2023-01-11T21:48:12.8738679Z 2023-01-11T21:48:12.8738808Z Generating XML reports... 2023-01-11T21:48:12.8739274Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213634.xml 2023-01-11T21:48:12.8739664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8739936Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8740331Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8740530Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8740550Z 2023-01-11T21:48:12.8740663Z Running tests... 2023-01-11T21:48:12.8741007Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8741356Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8741629Z test_ddp_logging_data_gpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8741861Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12136 2023-01-11T21:48:12.8742091Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12137 2023-01-11T21:48:12.8742495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8742663Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8743066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8743269Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8743659Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8743844Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8744244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8744442Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8744703Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8744939Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8745360Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8745781Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8746023Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8746265Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8746558Z [1673473007.802112] [7e0e28e30a97:12136:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8746800Z [1673473007.808049] [7e0e28e30a97:12136:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8747055Z [1673473007.808049] [7e0e28e30a97:12136:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8747345Z [1673473007.810328] [7e0e28e30a97:12137:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8747590Z [1673473007.815187] [7e0e28e30a97:12137:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8747824Z [1673473007.815187] [7e0e28e30a97:12137:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8748072Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8748319Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8748485Z ok (6.043s) 2023-01-11T21:48:12.8748504Z 2023-01-11T21:48:12.8748795Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8748913Z Ran 1 test in 6.043s 2023-01-11T21:48:12.8748933Z 2023-01-11T21:48:12.8749028Z OK 2023-01-11T21:48:12.8749047Z 2023-01-11T21:48:12.8749176Z Generating XML reports... 2023-01-11T21:48:12.8749692Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213643.xml 2023-01-11T21:48:12.8750076Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8750263Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8750667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8750868Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8750891Z 2023-01-11T21:48:12.8751003Z Running tests... 2023-01-11T21:48:12.8751286Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8751612Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8751913Z test_ddp_model_diff_num_params_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8752128Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12254 2023-01-11T21:48:12.8752363Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12255 2023-01-11T21:48:12.8752753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8752939Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8753341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8753549Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8753938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8754121Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8754548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8754733Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8754991Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8755245Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8755662Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8756081Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8756321Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8756560Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8756812Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.8757070Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.8757471Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8757889Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8758141Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:48:12.8758460Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:48:12.8758884Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.8759297Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.8759634Z [1673473016.417026] [7e0e28e30a97:12254:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8759886Z [1673473016.422702] [7e0e28e30a97:12254:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8760143Z [1673473016.422702] [7e0e28e30a97:12254:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8760440Z [1673473016.422039] [7e0e28e30a97:12255:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8760669Z [1673473016.428037] [7e0e28e30a97:12255:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8760919Z [1673473016.428037] [7e0e28e30a97:12255:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8761027Z ok (5.557s) 2023-01-11T21:48:12.8761047Z 2023-01-11T21:48:12.8761341Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8761459Z Ran 1 test in 5.557s 2023-01-11T21:48:12.8761478Z 2023-01-11T21:48:12.8761570Z OK 2023-01-11T21:48:12.8761590Z 2023-01-11T21:48:12.8761719Z Generating XML reports... 2023-01-11T21:48:12.8762188Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213651.xml 2023-01-11T21:48:12.8762585Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8762753Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8763155Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8763355Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8763375Z 2023-01-11T21:48:12.8763492Z Running tests... 2023-01-11T21:48:12.8763775Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8764106Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8764399Z test_ddp_model_diff_shape_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8764630Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12374 2023-01-11T21:48:12.8764849Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12375 2023-01-11T21:48:12.8765241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8765423Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8765829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8766033Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8766425Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8766608Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8767009Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8767208Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8767519Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8767775Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8768203Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8768671Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8768925Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8769169Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8769425Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.8769677Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.8770089Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8770507Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8770766Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:48:12.8771020Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:48:12.8771440Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.8771855Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.8772148Z [1673473024.581470] [7e0e28e30a97:12375:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8772392Z [1673473024.587472] [7e0e28e30a97:12375:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8772643Z [1673473024.587472] [7e0e28e30a97:12375:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8773187Z [1673473034.986906] [7e0e28e30a97:12375:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x24732f80 was not matched 2023-01-11T21:48:12.8773484Z [1673473024.580912] [7e0e28e30a97:12374:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8773709Z [1673473024.587819] [7e0e28e30a97:12374:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8773963Z [1673473024.587819] [7e0e28e30a97:12374:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8774298Z [1673473034.950508] [7e0e28e30a97:12374:1] ucc_schedule.h:189 UCC WARN timeout 10 sec. has expired on req 0x241fe5c0, seq_num 3, TL_UCP, team_id 1, size 2, rank 0, ctx_rank 0: Barrier n/a inplace=0 bytes=0 2023-01-11T21:48:12.8774590Z [1673473034.997076] [7e0e28e30a97:12374:0] mpool.c:55 UCX WARN object 0x2430fb00 {flags:0x20040 recv length 0 host memory} was not returned to mpool ucp_requests 2023-01-11T21:48:12.8774697Z ok (15.663s) 2023-01-11T21:48:12.8774717Z 2023-01-11T21:48:12.8775009Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8775127Z Ran 1 test in 15.663s 2023-01-11T21:48:12.8775147Z 2023-01-11T21:48:12.8775242Z OK 2023-01-11T21:48:12.8775262Z 2023-01-11T21:48:12.8775390Z Generating XML reports... 2023-01-11T21:48:12.8775842Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213659.xml 2023-01-11T21:48:12.8776327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8776511Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8776919Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8777175Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8777197Z 2023-01-11T21:48:12.8777316Z Running tests... 2023-01-11T21:48:12.8777604Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8777934Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8778260Z test_ddp_multiple_nested_unused_params_err_ignore_params (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8778478Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12494 2023-01-11T21:48:12.8778706Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12495 2023-01-11T21:48:12.8779103Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8779292Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8779700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8779902Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8780295Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8780479Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8780861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8781065Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8781321Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8781576Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8781996Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8782414Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8782654Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8782894Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8783189Z [1673473042.573694] [7e0e28e30a97:12494:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8783434Z [1673473042.580798] [7e0e28e30a97:12494:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8783670Z [1673473042.580798] [7e0e28e30a97:12494:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8783961Z [1673473042.576018] [7e0e28e30a97:12495:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8784204Z [1673473042.582431] [7e0e28e30a97:12495:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8784462Z [1673473042.582431] [7e0e28e30a97:12495:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8784567Z ok (6.033s) 2023-01-11T21:48:12.8784586Z 2023-01-11T21:48:12.8784936Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8785054Z Ran 1 test in 6.033s 2023-01-11T21:48:12.8785074Z 2023-01-11T21:48:12.8785168Z OK 2023-01-11T21:48:12.8785187Z 2023-01-11T21:48:12.8785315Z Generating XML reports... 2023-01-11T21:48:12.8785770Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213717.xml 2023-01-11T21:48:12.8786211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8786404Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8786813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8787013Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8787033Z 2023-01-11T21:48:12.8787147Z Running tests... 2023-01-11T21:48:12.8787434Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8787762Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8788049Z test_ddp_multiple_nested_unused_params_error (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8788281Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12612 2023-01-11T21:48:12.8788515Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12613 2023-01-11T21:48:12.8788911Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8789097Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8789498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8789697Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8790090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8790272Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8790650Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8790853Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8791110Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8791365Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8791784Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8792198Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8792443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8792681Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8792973Z [1673473051.093997] [7e0e28e30a97:12613:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8793202Z [1673473051.100083] [7e0e28e30a97:12613:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8793454Z [1673473051.100083] [7e0e28e30a97:12613:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8793741Z [1673473051.088619] [7e0e28e30a97:12612:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8794046Z [1673473051.095492] [7e0e28e30a97:12612:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8794300Z [1673473051.095492] [7e0e28e30a97:12612:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8794406Z ok (6.028s) 2023-01-11T21:48:12.8794426Z 2023-01-11T21:48:12.8794716Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8794879Z Ran 1 test in 6.029s 2023-01-11T21:48:12.8794900Z 2023-01-11T21:48:12.8794997Z OK 2023-01-11T21:48:12.8795016Z 2023-01-11T21:48:12.8795127Z Generating XML reports... 2023-01-11T21:48:12.8795607Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213726.xml 2023-01-11T21:48:12.8795998Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8796184Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8796592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8796793Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8796813Z 2023-01-11T21:48:12.8796924Z Running tests... 2023-01-11T21:48:12.8797205Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8797536Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8797786Z test_ddp_namedtuple (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8798019Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12730 2023-01-11T21:48:12.8798251Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12731 2023-01-11T21:48:12.8798641Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8798830Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8799229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8799432Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8799822Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8799987Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8800386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8800585Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8800842Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8801099Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8801518Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8801934Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8802179Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8802423Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8802699Z [1673473059.623233] [7e0e28e30a97:12730:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8802945Z [1673473059.629711] [7e0e28e30a97:12730:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8803265Z [1673473059.629711] [7e0e28e30a97:12730:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8803556Z [1673473059.629087] [7e0e28e30a97:12731:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8803802Z [1673473059.636099] [7e0e28e30a97:12731:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8804093Z [1673473059.636099] [7e0e28e30a97:12731:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8804206Z ok (5.928s) 2023-01-11T21:48:12.8804225Z 2023-01-11T21:48:12.8804517Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8804635Z Ran 1 test in 5.928s 2023-01-11T21:48:12.8804655Z 2023-01-11T21:48:12.8804750Z OK 2023-01-11T21:48:12.8804769Z 2023-01-11T21:48:12.8804881Z Generating XML reports... 2023-01-11T21:48:12.8805354Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213735.xml 2023-01-11T21:48:12.8805747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8805934Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8806340Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8806539Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8806559Z 2023-01-11T21:48:12.8806673Z Running tests... 2023-01-11T21:48:12.8806953Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8807264Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8807541Z test_ddp_new_tensor_in_fwd (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8807775Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12844 2023-01-11T21:48:12.8808005Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12845 2023-01-11T21:48:12.8808397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8808583Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8808990Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8809190Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8809579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8809744Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8810148Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8810346Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8810603Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8810856Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8811278Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8811696Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8811937Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8812174Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8812513Z [1673473068.114096] [7e0e28e30a97:12844:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8812758Z [1673473068.120388] [7e0e28e30a97:12844:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8813223Z [1673473068.120388] [7e0e28e30a97:12844:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8814085Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:48:12.8814380Z [1673473068.123981] [7e0e28e30a97:12845:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8814623Z [1673473068.129412] [7e0e28e30a97:12845:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8814877Z [1673473068.129412] [7e0e28e30a97:12845:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8815709Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:48:12.8815964Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8816214Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8816318Z ok (5.927s) 2023-01-11T21:48:12.8816338Z 2023-01-11T21:48:12.8816634Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8816753Z Ran 1 test in 5.927s 2023-01-11T21:48:12.8816773Z 2023-01-11T21:48:12.8816868Z OK 2023-01-11T21:48:12.8816888Z 2023-01-11T21:48:12.8817017Z Generating XML reports... 2023-01-11T21:48:12.8817489Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213743.xml 2023-01-11T21:48:12.8817880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8818050Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8818453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8818653Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8818672Z 2023-01-11T21:48:12.8818783Z Running tests... 2023-01-11T21:48:12.8819066Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8819398Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8819692Z test_ddp_new_tensor_in_fwd_static_graph (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8820486Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78338 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.637s) 2023-01-11T21:48:12.8820575Z 2023-01-11T21:48:12.8820871Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8820969Z Ran 1 test in 1.637s 2023-01-11T21:48:12.8821006Z 2023-01-11T21:48:12.8821101Z OK (skipped=1) 2023-01-11T21:48:12.8821121Z 2023-01-11T21:48:12.8821250Z Generating XML reports... 2023-01-11T21:48:12.8821768Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213751.xml 2023-01-11T21:48:12.8822176Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8822361Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8822765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8822965Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8822990Z 2023-01-11T21:48:12.8823106Z Running tests... 2023-01-11T21:48:12.8823368Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8823695Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8823989Z test_ddp_profiling_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8824775Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77342 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.626s) 2023-01-11T21:48:12.8824796Z 2023-01-11T21:48:12.8825074Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8825191Z Ran 1 test in 1.627s 2023-01-11T21:48:12.8825211Z 2023-01-11T21:48:12.8825327Z OK (skipped=1) 2023-01-11T21:48:12.8825346Z 2023-01-11T21:48:12.8825476Z Generating XML reports... 2023-01-11T21:48:12.8825948Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213756.xml 2023-01-11T21:48:12.8826337Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8826508Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8826914Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8827115Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8827135Z 2023-01-11T21:48:12.8827249Z Running tests... 2023-01-11T21:48:12.8827534Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8827862Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8828154Z test_ddp_profiling_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8828385Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13030 2023-01-11T21:48:12.8828598Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13031 2023-01-11T21:48:12.8828993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8829178Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8829582Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8829783Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8830172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8830431Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8830841Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8831041Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8831279Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8831583Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8832017Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8832438Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8832680Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8832924Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8833217Z [1673473084.861461] [7e0e28e30a97:13030:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8833461Z [1673473084.866710] [7e0e28e30a97:13030:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8833717Z [1673473084.866710] [7e0e28e30a97:13030:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8834056Z STAGE:2023-01-11 21:38:05 13030:13030 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8834345Z [1673473084.864118] [7e0e28e30a97:13031:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8834589Z [1673473084.869800] [7e0e28e30a97:13031:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8834849Z [1673473084.869800] [7e0e28e30a97:13031:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8835201Z STAGE:2023-01-11 21:38:05 13031:13031 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8835450Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8835698Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8836061Z STAGE:2023-01-11 21:38:06 13030:13030 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8836416Z STAGE:2023-01-11 21:38:06 13031:13031 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8836767Z STAGE:2023-01-11 21:38:06 13031:13031 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8837140Z STAGE:2023-01-11 21:38:06 13030:13030 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8837982Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:48:12.8838820Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:48:12.8839238Z STAGE:2023-01-11 21:38:06 13030:13030 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8839587Z STAGE:2023-01-11 21:38:06 13031:13031 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8839988Z STAGE:2023-01-11 21:38:06 13031:13031 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8840354Z STAGE:2023-01-11 21:38:06 13030:13030 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8840723Z STAGE:2023-01-11 21:38:06 13031:13031 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8841091Z STAGE:2023-01-11 21:38:06 13030:13030 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8841201Z ok (6.528s) 2023-01-11T21:48:12.8841222Z 2023-01-11T21:48:12.8841503Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8841621Z Ran 1 test in 6.528s 2023-01-11T21:48:12.8841641Z 2023-01-11T21:48:12.8841717Z OK 2023-01-11T21:48:12.8841736Z 2023-01-11T21:48:12.8841865Z Generating XML reports... 2023-01-11T21:48:12.8842336Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213800.xml 2023-01-11T21:48:12.8842732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8842918Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8843321Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8843521Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8843544Z 2023-01-11T21:48:12.8843658Z Running tests... 2023-01-11T21:48:12.8843921Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8844251Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8844530Z test_ddp_python_error_logged (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8844766Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13152 2023-01-11T21:48:12.8844997Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13153 2023-01-11T21:48:12.8845391Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8845576Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8845977Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8846181Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8846555Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8846740Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8847141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8847345Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8847599Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8847853Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8848275Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8848693Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8849000Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8849224Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8849559Z [1673473093.952814] [7e0e28e30a97:13152:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8849814Z [1673473093.959223] [7e0e28e30a97:13152:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8850072Z [1673473093.959223] [7e0e28e30a97:13152:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8850363Z [1673473093.959125] [7e0e28e30a97:13153:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8850608Z [1673473093.964499] [7e0e28e30a97:13153:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8850861Z [1673473093.964499] [7e0e28e30a97:13153:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8850966Z ok (5.413s) 2023-01-11T21:48:12.8850986Z 2023-01-11T21:48:12.8851282Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8851381Z Ran 1 test in 5.413s 2023-01-11T21:48:12.8851419Z 2023-01-11T21:48:12.8851495Z OK 2023-01-11T21:48:12.8851514Z 2023-01-11T21:48:12.8851643Z Generating XML reports... 2023-01-11T21:48:12.8852115Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213809.xml 2023-01-11T21:48:12.8852507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8852697Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8853249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8853454Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8853475Z 2023-01-11T21:48:12.8853589Z Running tests... 2023-01-11T21:48:12.8853857Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8854186Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8854473Z test_ddp_returns_tensor_with_no_grad (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8855288Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78595 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.626s) 2023-01-11T21:48:12.8855313Z 2023-01-11T21:48:12.8855595Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8855713Z Ran 1 test in 1.626s 2023-01-11T21:48:12.8855733Z 2023-01-11T21:48:12.8855845Z OK (skipped=1) 2023-01-11T21:48:12.8855864Z 2023-01-11T21:48:12.8855993Z Generating XML reports... 2023-01-11T21:48:12.8856467Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213817.xml 2023-01-11T21:48:12.8856860Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8857027Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8857429Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8857629Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8857726Z 2023-01-11T21:48:12.8857847Z Running tests... 2023-01-11T21:48:12.8858134Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8858464Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8858761Z test_ddp_shared_grad_acc_unused_params (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8859055Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13300 2023-01-11T21:48:12.8859277Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13301 2023-01-11T21:48:12.8859676Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8859859Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8860259Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8860464Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8860857Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8861042Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8861447Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8861645Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8861884Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8862142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8862563Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8862990Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8863238Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8863480Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8864444Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T21:48:12.8864562Z warnings.warn( 2023-01-11T21:48:12.8865522Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T21:48:12.8865643Z warnings.warn( 2023-01-11T21:48:12.8865873Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8866124Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8866416Z [1673473106.034218] [7e0e28e30a97:13300:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8866663Z [1673473106.040314] [7e0e28e30a97:13300:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8866915Z [1673473106.040314] [7e0e28e30a97:13300:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8867302Z [1673473106.034916] [7e0e28e30a97:13301:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8867546Z [1673473106.040288] [7e0e28e30a97:13301:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8867846Z [1673473106.040288] [7e0e28e30a97:13301:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8867958Z ok (5.936s) 2023-01-11T21:48:12.8867978Z 2023-01-11T21:48:12.8868275Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8868374Z Ran 1 test in 5.936s 2023-01-11T21:48:12.8868394Z 2023-01-11T21:48:12.8868488Z OK 2023-01-11T21:48:12.8868507Z 2023-01-11T21:48:12.8868636Z Generating XML reports... 2023-01-11T21:48:12.8869109Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213821.xml 2023-01-11T21:48:12.8869504Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8869694Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8870096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8870299Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8870319Z 2023-01-11T21:48:12.8870433Z Running tests... 2023-01-11T21:48:12.8870696Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8871026Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8871317Z test_ddp_static_graph_nested_types (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8872110Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77625 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.622s) 2023-01-11T21:48:12.8872132Z 2023-01-11T21:48:12.8872412Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8872528Z Ran 1 test in 1.623s 2023-01-11T21:48:12.8872551Z 2023-01-11T21:48:12.8872664Z OK (skipped=1) 2023-01-11T21:48:12.8872684Z 2023-01-11T21:48:12.8872812Z Generating XML reports... 2023-01-11T21:48:12.8873282Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213829.xml 2023-01-11T21:48:12.8873655Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8873839Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8874248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8874447Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8874466Z 2023-01-11T21:48:12.8874578Z Running tests... 2023-01-11T21:48:12.8874860Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8875192Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8875478Z test_ddp_sync_bn_training_vs_eval (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8875709Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13452 2023-01-11T21:48:12.8875923Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13453 2023-01-11T21:48:12.8876321Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8876569Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8876976Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8877177Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8877615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8877806Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8878212Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8878392Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8878649Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8878908Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8879330Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8879746Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8879994Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8880237Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8880530Z [1673473118.662016] [7e0e28e30a97:13453:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8880774Z [1673473118.666899] [7e0e28e30a97:13453:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8881031Z [1673473118.666899] [7e0e28e30a97:13453:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8881366Z STAGE:2023-01-11 21:38:39 13453:13453 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8881656Z [1673473118.652247] [7e0e28e30a97:13452:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8881903Z [1673473118.658165] [7e0e28e30a97:13452:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8882157Z [1673473118.658165] [7e0e28e30a97:13452:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8882512Z STAGE:2023-01-11 21:38:39 13452:13452 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8882759Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8883005Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T21:48:12.8883364Z STAGE:2023-01-11 21:38:39 13452:13452 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8883717Z STAGE:2023-01-11 21:38:39 13453:13453 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8884070Z STAGE:2023-01-11 21:38:39 13452:13452 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8884443Z STAGE:2023-01-11 21:38:39 13453:13453 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8884794Z STAGE:2023-01-11 21:38:39 13452:13452 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.8885151Z STAGE:2023-01-11 21:38:40 13452:13452 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.8885522Z STAGE:2023-01-11 21:38:40 13452:13452 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.8885688Z ok (6.727s) 2023-01-11T21:48:12.8885708Z 2023-01-11T21:48:12.8885997Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8886114Z Ran 1 test in 6.728s 2023-01-11T21:48:12.8886134Z 2023-01-11T21:48:12.8886227Z OK 2023-01-11T21:48:12.8886247Z 2023-01-11T21:48:12.8886357Z Generating XML reports... 2023-01-11T21:48:12.8886876Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213834.xml 2023-01-11T21:48:12.8887283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8887468Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8887872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8888073Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8888096Z 2023-01-11T21:48:12.8888209Z Running tests... 2023-01-11T21:48:12.8888494Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8888804Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8889082Z test_ddp_sync_module_states (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8889317Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13574 2023-01-11T21:48:12.8889548Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13575 2023-01-11T21:48:12.8889940Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8890126Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8890529Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8890734Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8891122Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8891285Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8891689Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8891888Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8892144Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8892399Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8892820Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8893408Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8893653Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8893875Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8894170Z [1673473127.850412] [7e0e28e30a97:13574:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8894417Z [1673473127.855603] [7e0e28e30a97:13574:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8894670Z [1673473127.855603] [7e0e28e30a97:13574:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8894960Z [1673473127.855736] [7e0e28e30a97:13575:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8895301Z [1673473127.862075] [7e0e28e30a97:13575:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8895550Z [1673473127.862075] [7e0e28e30a97:13575:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8895657Z ok (5.424s) 2023-01-11T21:48:12.8895677Z 2023-01-11T21:48:12.8896029Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8896154Z Ran 1 test in 5.424s 2023-01-11T21:48:12.8896174Z 2023-01-11T21:48:12.8896250Z OK 2023-01-11T21:48:12.8896270Z 2023-01-11T21:48:12.8896399Z Generating XML reports... 2023-01-11T21:48:12.8896877Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213843.xml 2023-01-11T21:48:12.8897271Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8897459Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8897860Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8898060Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8898080Z 2023-01-11T21:48:12.8898194Z Running tests... 2023-01-11T21:48:12.8898478Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8898790Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8899076Z test_ddp_uneven_input_exception (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8899307Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13688 2023-01-11T21:48:12.8899538Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13689 2023-01-11T21:48:12.8899934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8900119Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8900520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8900720Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8901095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8901279Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8901682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8901882Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8902138Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8902397Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8902819Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8903239Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8903481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8903703Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8904077Z [1673473135.864966] [7e0e28e30a97:13689:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8904356Z [1673473135.872333] [7e0e28e30a97:13689:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8904693Z [1673473135.872333] [7e0e28e30a97:13689:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8905001Z [1673473135.858172] [7e0e28e30a97:13688:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8905310Z [1673473135.863452] [7e0e28e30a97:13688:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8905540Z [1673473135.863452] [7e0e28e30a97:13688:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8905813Z ok (5.539s) 2023-01-11T21:48:12.8905834Z 2023-01-11T21:48:12.8906152Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8906303Z Ran 1 test in 5.540s 2023-01-11T21:48:12.8906323Z 2023-01-11T21:48:12.8906453Z OK 2023-01-11T21:48:12.8906473Z 2023-01-11T21:48:12.8906632Z Generating XML reports... 2023-01-11T21:48:12.8907152Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213851.xml 2023-01-11T21:48:12.8907563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8907723Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8908153Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8908428Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8908449Z 2023-01-11T21:48:12.8908595Z Running tests... 2023-01-11T21:48:12.8908896Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8909245Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8909559Z test_ddp_uneven_input_join_disable (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8910345Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78684 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.633s) 2023-01-11T21:48:12.8910371Z 2023-01-11T21:48:12.8910682Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8910832Z Ran 1 test in 1.633s 2023-01-11T21:48:12.8910852Z 2023-01-11T21:48:12.8910943Z OK (skipped=1) 2023-01-11T21:48:12.8910962Z 2023-01-11T21:48:12.8911162Z Generating XML reports... 2023-01-11T21:48:12.8911643Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213859.xml 2023-01-11T21:48:12.8912055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8912268Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8912686Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8912947Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8912967Z 2023-01-11T21:48:12.8913120Z Running tests... 2023-01-11T21:48:12.8913430Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8913725Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8914059Z test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8914839Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/75648 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.597s) 2023-01-11T21:48:12.8914915Z 2023-01-11T21:48:12.8915228Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8915387Z Ran 1 test in 1.598s 2023-01-11T21:48:12.8915407Z 2023-01-11T21:48:12.8915549Z OK (skipped=1) 2023-01-11T21:48:12.8915568Z 2023-01-11T21:48:12.8915775Z Generating XML reports... 2023-01-11T21:48:12.8916277Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213903.xml 2023-01-11T21:48:12.8916683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8916843Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8917301Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8917530Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8917551Z 2023-01-11T21:48:12.8917704Z Running tests... 2023-01-11T21:48:12.8918035Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8918384Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8918711Z test_ddp_uneven_inputs_stop_iteration_sync_bn (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8919491Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78113 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.610s) 2023-01-11T21:48:12.8919513Z 2023-01-11T21:48:12.8919811Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8919996Z Ran 1 test in 1.611s 2023-01-11T21:48:12.8920016Z 2023-01-11T21:48:12.8920108Z OK (skipped=1) 2023-01-11T21:48:12.8920126Z 2023-01-11T21:48:12.8920297Z Generating XML reports... 2023-01-11T21:48:12.8920785Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213907.xml 2023-01-11T21:48:12.8921193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8921402Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8921816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8922043Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8922063Z 2023-01-11T21:48:12.8922205Z Running tests... 2023-01-11T21:48:12.8922454Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8922852Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8923183Z test_ddp_unused_params_rebuild_buckets_exception (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8923437Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13904 2023-01-11T21:48:12.8923721Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13905 2023-01-11T21:48:12.8924129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8924341Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8924755Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8924989Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8925409Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8925664Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8926076Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8926304Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8926633Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8926925Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8927378Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8927809Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8928082Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8928293Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8928644Z [1673473156.456757] [7e0e28e30a97:13904:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8928957Z [1673473156.463283] [7e0e28e30a97:13904:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8929271Z [1673473156.463283] [7e0e28e30a97:13904:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8929581Z [1673473156.458743] [7e0e28e30a97:13905:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8929858Z [1673473156.465178] [7e0e28e30a97:13905:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8930135Z [1673473156.465178] [7e0e28e30a97:13905:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8930278Z ok (6.021s) 2023-01-11T21:48:12.8930299Z 2023-01-11T21:48:12.8930605Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8930702Z Ran 1 test in 6.021s 2023-01-11T21:48:12.8930812Z 2023-01-11T21:48:12.8930891Z OK 2023-01-11T21:48:12.8930909Z 2023-01-11T21:48:12.8931072Z Generating XML reports... 2023-01-11T21:48:12.8931564Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213911.xml 2023-01-11T21:48:12.8931980Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8932190Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8932611Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8932838Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8933005Z 2023-01-11T21:48:12.8933163Z Running tests... 2023-01-11T21:48:12.8933421Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8933817Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8934160Z test_ddp_zero_output_features (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8934427Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14022 2023-01-11T21:48:12.8934682Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14023 2023-01-11T21:48:12.8935090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8935398Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8935820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8936047Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8936460Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8936738Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8937157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8937381Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8937661Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8937946Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8938385Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8938819Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8939037Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8939487Z /opt/conda/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op 2023-01-11T21:48:12.8939818Z warnings.warn("Initializing zero-element tensors is a no-op") 2023-01-11T21:48:12.8940089Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8940500Z /opt/conda/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op 2023-01-11T21:48:12.8940789Z warnings.warn("Initializing zero-element tensors is a no-op") 2023-01-11T21:48:12.8941101Z [1673473164.987258] [7e0e28e30a97:14022:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8941368Z [1673473164.994017] [7e0e28e30a97:14022:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8941654Z [1673473164.994017] [7e0e28e30a97:14022:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8941963Z [1673473164.990045] [7e0e28e30a97:14023:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.8942179Z [1673473164.995017] [7e0e28e30a97:14023:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.8942491Z [1673473164.995017] [7e0e28e30a97:14023:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.8942636Z ok (5.438s) 2023-01-11T21:48:12.8942656Z 2023-01-11T21:48:12.8942961Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8943109Z Ran 1 test in 5.439s 2023-01-11T21:48:12.8943130Z 2023-01-11T21:48:12.8943257Z OK 2023-01-11T21:48:12.8943276Z 2023-01-11T21:48:12.8943443Z Generating XML reports... 2023-01-11T21:48:12.8943934Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213920.xml 2023-01-11T21:48:12.8944367Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8944529Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8944983Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8945271Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8945291Z 2023-01-11T21:48:12.8956735Z Running tests... 2023-01-11T21:48:12.8957091Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8957426Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8957803Z test_destroy_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8958044Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14136 2023-01-11T21:48:12.8958266Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14137 2023-01-11T21:48:12.8958637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8958817Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8959202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8959401Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8959769Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8959945Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8960322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8960514Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8960743Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8960989Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8961395Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8961799Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8962031Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8962271Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.8962503Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8962740Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.8963137Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8963513Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8963626Z ok (4.339s) 2023-01-11T21:48:12.8963648Z 2023-01-11T21:48:12.8963915Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8964032Z Ran 1 test in 4.339s 2023-01-11T21:48:12.8964052Z 2023-01-11T21:48:12.8964151Z OK 2023-01-11T21:48:12.8964169Z 2023-01-11T21:48:12.8964298Z Generating XML reports... 2023-01-11T21:48:12.8964755Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213928.xml 2023-01-11T21:48:12.8965130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8965309Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8965671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8965867Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8965947Z 2023-01-11T21:48:12.8966064Z Running tests... 2023-01-11T21:48:12.8966335Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8966649Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8966906Z test_destroy_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8967179Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14239 2023-01-11T21:48:12.8967408Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14240 2023-01-11T21:48:12.8967765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8967945Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8968325Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8968523Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8968892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8969071Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8969450Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8969647Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8969893Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8970119Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8970521Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8970923Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8971155Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8971398Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.8971623Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8971864Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.8972262Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8972651Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.8972737Z ok (4.331s) 2023-01-11T21:48:12.8972758Z 2023-01-11T21:48:12.8973241Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8973364Z Ran 1 test in 4.331s 2023-01-11T21:48:12.8973384Z 2023-01-11T21:48:12.8973481Z OK 2023-01-11T21:48:12.8973501Z 2023-01-11T21:48:12.8973630Z Generating XML reports... 2023-01-11T21:48:12.8974087Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213935.xml 2023-01-11T21:48:12.8974461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8974641Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8975003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8975195Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8975214Z 2023-01-11T21:48:12.8975328Z Running tests... 2023-01-11T21:48:12.8975724Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8976037Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8976316Z test_detect_ddp_is_actually_static (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8977125Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78767 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.645s) 2023-01-11T21:48:12.8977150Z 2023-01-11T21:48:12.8977422Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8977541Z Ran 1 test in 1.645s 2023-01-11T21:48:12.8977560Z 2023-01-11T21:48:12.8977669Z OK (skipped=1) 2023-01-11T21:48:12.8977689Z 2023-01-11T21:48:12.8977795Z Generating XML reports... 2023-01-11T21:48:12.8978255Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213942.xml 2023-01-11T21:48:12.8978626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8978804Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8979185Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8979379Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8979399Z 2023-01-11T21:48:12.8979512Z Running tests... 2023-01-11T21:48:12.8979777Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8980067Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8980347Z test_different_graph_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8981093Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78748 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.605s) 2023-01-11T21:48:12.8981114Z 2023-01-11T21:48:12.8981377Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8981492Z Ran 1 test in 1.606s 2023-01-11T21:48:12.8981512Z 2023-01-11T21:48:12.8981620Z OK (skipped=1) 2023-01-11T21:48:12.8981639Z 2023-01-11T21:48:12.8981765Z Generating XML reports... 2023-01-11T21:48:12.8982210Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213946.xml 2023-01-11T21:48:12.8982579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8982763Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8983121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8983315Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8983334Z 2023-01-11T21:48:12.8983445Z Running tests... 2023-01-11T21:48:12.8983712Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8984022Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8984293Z test_dump_DDP_relevant_env_vars (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.8984517Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14410 2023-01-11T21:48:12.8984737Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14411 2023-01-11T21:48:12.8985183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8985341Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8985720Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8985913Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8986321Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8986503Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8986882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8987072Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8987319Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.8987550Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.8987951Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8988347Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.8988582Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.8988815Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.8988921Z ok (4.246s) 2023-01-11T21:48:12.8988940Z 2023-01-11T21:48:12.8989208Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8989321Z Ran 1 test in 4.246s 2023-01-11T21:48:12.8989341Z 2023-01-11T21:48:12.8989435Z OK 2023-01-11T21:48:12.8989458Z 2023-01-11T21:48:12.8989565Z Generating XML reports... 2023-01-11T21:48:12.8990013Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213950.xml 2023-01-11T21:48:12.8990387Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8990565Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8990946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8991140Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8991159Z 2023-01-11T21:48:12.8991269Z Running tests... 2023-01-11T21:48:12.8991534Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8991825Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8992087Z test_gather (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:48:12.8992108Z 2023-01-11T21:48:12.8992370Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8992486Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8992505Z 2023-01-11T21:48:12.8992615Z OK (skipped=1) 2023-01-11T21:48:12.8992634Z 2023-01-11T21:48:12.8992758Z Generating XML reports... 2023-01-11T21:48:12.8993205Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213957.xml 2023-01-11T21:48:12.8993574Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8993753Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8994110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8994363Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8994382Z 2023-01-11T21:48:12.8994493Z Running tests... 2023-01-11T21:48:12.8994766Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8995077Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8995390Z test_gather_checks (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:48:12.8995412Z 2023-01-11T21:48:12.8995681Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8995798Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8995817Z 2023-01-11T21:48:12.8995927Z OK (skipped=1) 2023-01-11T21:48:12.8995946Z 2023-01-11T21:48:12.8996053Z Generating XML reports... 2023-01-11T21:48:12.8996495Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213959.xml 2023-01-11T21:48:12.8996866Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.8997044Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.8997419Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.8997610Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.8997634Z 2023-01-11T21:48:12.8997746Z Running tests... 2023-01-11T21:48:12.8998010Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8998319Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.8998556Z test_gather_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA gather (0.002s) 2023-01-11T21:48:12.8998577Z 2023-01-11T21:48:12.8998837Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.8998953Z Ran 1 test in 0.002s 2023-01-11T21:48:12.8998972Z 2023-01-11T21:48:12.8999081Z OK (skipped=1) 2023-01-11T21:48:12.8999100Z 2023-01-11T21:48:12.8999227Z Generating XML reports... 2023-01-11T21:48:12.8999672Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214002.xml 2023-01-11T21:48:12.9000045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9000224Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9000586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9000777Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9000797Z 2023-01-11T21:48:12.9000911Z Running tests... 2023-01-11T21:48:12.9001176Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9001492Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9001761Z test_gather_full_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:48:12.9001781Z 2023-01-11T21:48:12.9002043Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9002158Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9002177Z 2023-01-11T21:48:12.9002289Z OK (skipped=1) 2023-01-11T21:48:12.9002308Z 2023-01-11T21:48:12.9002415Z Generating XML reports... 2023-01-11T21:48:12.9002862Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214004.xml 2023-01-11T21:48:12.9003232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9003409Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9003856Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9004050Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9004069Z 2023-01-11T21:48:12.9004179Z Running tests... 2023-01-11T21:48:12.9004444Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9004800Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9005055Z test_gather_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:48:12.9005075Z 2023-01-11T21:48:12.9005338Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9005452Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9005472Z 2023-01-11T21:48:12.9005582Z OK (skipped=1) 2023-01-11T21:48:12.9005601Z 2023-01-11T21:48:12.9005727Z Generating XML reports... 2023-01-11T21:48:12.9006175Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214006.xml 2023-01-11T21:48:12.9006546Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9006723Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9007105Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9007278Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9007297Z 2023-01-11T21:48:12.9007408Z Running tests... 2023-01-11T21:48:12.9007671Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9007983Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9008249Z test_gather_object (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:48:12.9008272Z 2023-01-11T21:48:12.9008533Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9008648Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9008667Z 2023-01-11T21:48:12.9008776Z OK (skipped=1) 2023-01-11T21:48:12.9008794Z 2023-01-11T21:48:12.9008901Z Generating XML reports... 2023-01-11T21:48:12.9009343Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214009.xml 2023-01-11T21:48:12.9009712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9009890Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9010266Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9010460Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9010483Z 2023-01-11T21:48:12.9010593Z Running tests... 2023-01-11T21:48:12.9010855Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9011164Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9011422Z test_gather_object_subgroup (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:48:12.9011442Z 2023-01-11T21:48:12.9011705Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9011820Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9011839Z 2023-01-11T21:48:12.9011950Z OK (skipped=1) 2023-01-11T21:48:12.9011969Z 2023-01-11T21:48:12.9012095Z Generating XML reports... 2023-01-11T21:48:12.9012540Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214011.xml 2023-01-11T21:48:12.9013053Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9013329Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9013717Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9013890Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9013910Z 2023-01-11T21:48:12.9014019Z Running tests... 2023-01-11T21:48:12.9014345Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9014671Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9014921Z test_get_backend (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9015144Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14744 2023-01-11T21:48:12.9015363Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14745 2023-01-11T21:48:12.9015740Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9015897Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9016276Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9016471Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9016838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9017014Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9017388Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9017581Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9017832Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9018080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9018461Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9018865Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9019100Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9019342Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.9019566Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9019804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.9020201Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9020595Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9020700Z ok (4.339s) 2023-01-11T21:48:12.9020720Z 2023-01-11T21:48:12.9020969Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9021087Z Ran 1 test in 4.340s 2023-01-11T21:48:12.9021106Z 2023-01-11T21:48:12.9021200Z OK 2023-01-11T21:48:12.9021219Z 2023-01-11T21:48:12.9021345Z Generating XML reports... 2023-01-11T21:48:12.9021792Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214013.xml 2023-01-11T21:48:12.9022163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9022401Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9022785Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9022957Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9022996Z 2023-01-11T21:48:12.9023088Z Running tests... 2023-01-11T21:48:12.9023353Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9023718Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9024003Z test_get_future (__main__.TestDistBackendWithSpawn) ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T21:48:12.9024022Z 2023-01-11T21:48:12.9024290Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9024405Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9024425Z 2023-01-11T21:48:12.9024536Z OK (skipped=1) 2023-01-11T21:48:12.9024559Z 2023-01-11T21:48:12.9024686Z Generating XML reports... 2023-01-11T21:48:12.9025111Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214020.xml 2023-01-11T21:48:12.9025476Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9025657Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9026037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9026229Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9026249Z 2023-01-11T21:48:12.9026359Z Running tests... 2023-01-11T21:48:12.9026624Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9026934Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9027181Z test_get_rank (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9027385Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14880 2023-01-11T21:48:12.9027604Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14881 2023-01-11T21:48:12.9027973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9028154Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9028533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9028725Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9029091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9029267Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9029628Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9029817Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9030065Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9030313Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9030712Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9031109Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9031342Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9031571Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9031733Z ok (4.446s) 2023-01-11T21:48:12.9031753Z 2023-01-11T21:48:12.9032003Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9032118Z Ran 1 test in 4.446s 2023-01-11T21:48:12.9032138Z 2023-01-11T21:48:12.9032233Z OK 2023-01-11T21:48:12.9032252Z 2023-01-11T21:48:12.9032377Z Generating XML reports... 2023-01-11T21:48:12.9032870Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214023.xml 2023-01-11T21:48:12.9033259Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9033438Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9033814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9033987Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9034031Z 2023-01-11T21:48:12.9034122Z Running tests... 2023-01-11T21:48:12.9034387Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9034697Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9034966Z test_get_rank_size_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9035191Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14983 2023-01-11T21:48:12.9035412Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14984 2023-01-11T21:48:12.9035780Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9035956Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9036315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9036513Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9036878Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9037054Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9037428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9037619Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9037866Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9038110Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9038489Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9038891Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9039123Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9039366Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.9039594Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9039830Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.9040225Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9040616Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9040791Z ok (4.339s) 2023-01-11T21:48:12.9040811Z 2023-01-11T21:48:12.9041064Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9041179Z Ran 1 test in 4.339s 2023-01-11T21:48:12.9041198Z 2023-01-11T21:48:12.9041295Z OK 2023-01-11T21:48:12.9041314Z 2023-01-11T21:48:12.9041440Z Generating XML reports... 2023-01-11T21:48:12.9041933Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214030.xml 2023-01-11T21:48:12.9042320Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9042502Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9042882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9043075Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9043099Z 2023-01-11T21:48:12.9043191Z Running tests... 2023-01-11T21:48:12.9043457Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9043769Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9044031Z test_get_rank_size_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9044256Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15086 2023-01-11T21:48:12.9044476Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15087 2023-01-11T21:48:12.9044849Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9045028Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9045387Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9045583Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9045947Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9046126Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9046502Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9046695Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9046943Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9047188Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9047587Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9047961Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9048197Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9048439Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.9048664Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9048903Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.9049301Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9049692Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9049798Z ok (4.429s) 2023-01-11T21:48:12.9049817Z 2023-01-11T21:48:12.9050082Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9050240Z Ran 1 test in 4.429s 2023-01-11T21:48:12.9050259Z 2023-01-11T21:48:12.9050353Z OK 2023-01-11T21:48:12.9050372Z 2023-01-11T21:48:12.9050499Z Generating XML reports... 2023-01-11T21:48:12.9050953Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214037.xml 2023-01-11T21:48:12.9051369Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9051553Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9051940Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9052134Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9052154Z 2023-01-11T21:48:12.9052265Z Running tests... 2023-01-11T21:48:12.9052509Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9052824Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9053321Z test_invalid_static_graph (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9053553Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15189 2023-01-11T21:48:12.9053777Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15190 2023-01-11T21:48:12.9054156Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9054332Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9054712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9054885Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9055255Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9055431Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9055804Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9056016Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9056266Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9056512Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9056910Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9057309Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9057524Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9057757Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9058037Z [1673473248.682267] [7e0e28e30a97:15190:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9058276Z [1673473248.688083] [7e0e28e30a97:15190:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9058518Z [1673473248.688083] [7e0e28e30a97:15190:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9058792Z [1673473248.680490] [7e0e28e30a97:15189:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9059024Z [1673473248.685757] [7e0e28e30a97:15189:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9059354Z [1673473248.685757] [7e0e28e30a97:15189:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9059459Z ok (5.929s) 2023-01-11T21:48:12.9059479Z 2023-01-11T21:48:12.9059736Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9059852Z Ran 1 test in 5.929s 2023-01-11T21:48:12.9059871Z 2023-01-11T21:48:12.9060025Z OK 2023-01-11T21:48:12.9060048Z 2023-01-11T21:48:12.9060182Z Generating XML reports... 2023-01-11T21:48:12.9060634Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214044.xml 2023-01-11T21:48:12.9061005Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9061183Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9061566Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9061760Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9061780Z 2023-01-11T21:48:12.9061872Z Running tests... 2023-01-11T21:48:12.9062140Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9062455Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9062694Z test_irecv (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9062914Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15307 2023-01-11T21:48:12.9063132Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15308 2023-01-11T21:48:12.9063501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9063684Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9064042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9064237Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9064599Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9064779Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9065153Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9065344Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9065593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9065839Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9066237Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9066613Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9066843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9067076Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9067354Z [1673473256.412011] [7e0e28e30a97:15308:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9067589Z [1673473257.181667] [7e0e28e30a97:15308:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9067834Z [1673473257.181667] [7e0e28e30a97:15308:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9068167Z [1673473256.390724] [7e0e28e30a97:15307:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9068397Z [1673473257.173255] [7e0e28e30a97:15307:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9068703Z [1673473257.173255] [7e0e28e30a97:15307:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9068815Z ok (5.430s) 2023-01-11T21:48:12.9068835Z 2023-01-11T21:48:12.9069091Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9069207Z Ran 1 test in 5.430s 2023-01-11T21:48:12.9069226Z 2023-01-11T21:48:12.9069322Z OK 2023-01-11T21:48:12.9069341Z 2023-01-11T21:48:12.9069466Z Generating XML reports... 2023-01-11T21:48:12.9069916Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214052.xml 2023-01-11T21:48:12.9070293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9070470Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9070851Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9071028Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9071067Z 2023-01-11T21:48:12.9071159Z Running tests... 2023-01-11T21:48:12.9071423Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9071738Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9071978Z test_isend (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9072202Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15417 2023-01-11T21:48:12.9072421Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15418 2023-01-11T21:48:12.9072791Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9072967Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9073332Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9073526Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9073887Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9074060Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9074437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9074633Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9074879Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9075124Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9075504Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9075901Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9076132Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9076362Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9076638Z [1673473264.426893] [7e0e28e30a97:15417:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9076933Z [1673473265.214991] [7e0e28e30a97:15417:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9077174Z [1673473265.214991] [7e0e28e30a97:15417:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9077488Z [1673473264.447339] [7e0e28e30a97:15418:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9077726Z [1673473265.223180] [7e0e28e30a97:15418:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9077965Z [1673473265.223180] [7e0e28e30a97:15418:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9078051Z ok (5.446s) 2023-01-11T21:48:12.9078074Z 2023-01-11T21:48:12.9078352Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9078466Z Ran 1 test in 5.446s 2023-01-11T21:48:12.9078486Z 2023-01-11T21:48:12.9078581Z OK 2023-01-11T21:48:12.9078599Z 2023-01-11T21:48:12.9078725Z Generating XML reports... 2023-01-11T21:48:12.9079175Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214100.xml 2023-01-11T21:48:12.9079548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9079725Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9080104Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9080279Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9080298Z 2023-01-11T21:48:12.9080409Z Running tests... 2023-01-11T21:48:12.9080676Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9080988Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9081261Z test_isend_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9081481Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15527 2023-01-11T21:48:12.9081702Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15528 2023-01-11T21:48:12.9082073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9082230Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9082608Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9082798Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9083165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9083340Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9083712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9083907Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9084153Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9084398Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9084778Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9085172Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9085461Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9085806Z STAGE:2023-01-11 21:41:12 15528:15528 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9086036Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9086411Z STAGE:2023-01-11 21:41:12 15527:15527 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9086696Z [1673473272.443693] [7e0e28e30a97:15527:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9086932Z [1673473273.501707] [7e0e28e30a97:15527:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9087175Z [1673473273.501707] [7e0e28e30a97:15527:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9087506Z STAGE:2023-01-11 21:41:13 15527:15527 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9087781Z [1673473272.444522] [7e0e28e30a97:15528:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9088016Z [1673473273.496036] [7e0e28e30a97:15528:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9088255Z [1673473273.496036] [7e0e28e30a97:15528:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9088593Z STAGE:2023-01-11 21:41:13 15528:15528 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9088942Z STAGE:2023-01-11 21:41:13 15527:15527 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9089290Z STAGE:2023-01-11 21:41:13 15528:15528 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9089401Z ok (5.937s) 2023-01-11T21:48:12.9089420Z 2023-01-11T21:48:12.9089687Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9089783Z Ran 1 test in 5.937s 2023-01-11T21:48:12.9089821Z 2023-01-11T21:48:12.9089897Z OK 2023-01-11T21:48:12.9089915Z 2023-01-11T21:48:12.9090040Z Generating XML reports... 2023-01-11T21:48:12.9090490Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214108.xml 2023-01-11T21:48:12.9090861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9091040Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9091420Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9091614Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9091637Z 2023-01-11T21:48:12.9091748Z Running tests... 2023-01-11T21:48:12.9091993Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9092308Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9092572Z test_isend_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9092798Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15641 2023-01-11T21:48:12.9093179Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15642 2023-01-11T21:48:12.9093564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9093742Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9094121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9094377Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9094754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9094933Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9095367Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9095565Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9095813Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9096060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9096468Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9096868Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9097080Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9097415Z STAGE:2023-01-11 21:41:20 15641:15641 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9097648Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9097985Z STAGE:2023-01-11 21:41:20 15642:15642 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9098261Z [1673473280.934200] [7e0e28e30a97:15641:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9098494Z [1673473281.991100] [7e0e28e30a97:15641:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9098738Z [1673473281.991100] [7e0e28e30a97:15641:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9099080Z STAGE:2023-01-11 21:41:22 15641:15641 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9099354Z [1673473280.955124] [7e0e28e30a97:15642:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9099588Z [1673473282.012119] [7e0e28e30a97:15642:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9099808Z [1673473282.012119] [7e0e28e30a97:15642:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9100147Z STAGE:2023-01-11 21:41:22 15642:15642 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9100496Z STAGE:2023-01-11 21:41:22 15641:15641 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9100849Z STAGE:2023-01-11 21:41:22 15642:15642 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9100956Z ok (5.942s) 2023-01-11T21:48:12.9100976Z 2023-01-11T21:48:12.9101242Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9101357Z Ran 1 test in 5.942s 2023-01-11T21:48:12.9101376Z 2023-01-11T21:48:12.9101471Z OK 2023-01-11T21:48:12.9101490Z 2023-01-11T21:48:12.9101618Z Generating XML reports... 2023-01-11T21:48:12.9102045Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214117.xml 2023-01-11T21:48:12.9102417Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9102594Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9102973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9103240Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9103260Z 2023-01-11T21:48:12.9103372Z Running tests... 2023-01-11T21:48:12.9103642Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9103956Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9104266Z test_monitored_barrier_allreduce_hang (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9104494Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15755 2023-01-11T21:48:12.9104713Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15756 2023-01-11T21:48:12.9105090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9105268Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9105650Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9105842Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9106204Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9106382Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9106740Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9106929Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9107174Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9107420Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9107821Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9108215Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9108446Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9108695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.9108923Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9109145Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.9109545Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9109936Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9110183Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:48:12.9110424Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:48:12.9110812Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.9111204Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.9111436Z [E ProcessGroupGloo.cpp:138] [Rank 0]: Rank 1 failed to pass monitoredBarrier in 100 ms 2023-01-11T21:48:12.9111542Z ok (21.352s) 2023-01-11T21:48:12.9111562Z 2023-01-11T21:48:12.9111811Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9111927Z Ran 1 test in 21.352s 2023-01-11T21:48:12.9112000Z 2023-01-11T21:48:12.9112098Z OK 2023-01-11T21:48:12.9112117Z 2023-01-11T21:48:12.9112245Z Generating XML reports... 2023-01-11T21:48:12.9112697Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214125.xml 2023-01-11T21:48:12.9113067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9113291Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9113687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9113860Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9113900Z 2023-01-11T21:48:12.9113992Z Running tests... 2023-01-11T21:48:12.9114255Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9114567Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9114871Z test_monitored_barrier_allreduce_hang_wait_all_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9115094Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15876 2023-01-11T21:48:12.9115314Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15877 2023-01-11T21:48:12.9115688Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9115865Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9116227Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9116419Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9116787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9116966Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9117345Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9117537Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9117784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9118031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9118413Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9118809Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9119041Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9119286Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.9119513Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9119749Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.9120150Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9120545Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9120784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:48:12.9121026Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:48:12.9121403Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.9121864Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.9122099Z [E ProcessGroupGloo.cpp:2803] [Rank 0]: Rank 1 failed to pass monitoredBarrier in 100 ms 2023-01-11T21:48:12.9122329Z [E ProcessGroupGloo.cpp:138] [Rank 0]: Ranks 1 failed to pass monitoredBarrier in 100 ms 2023-01-11T21:48:12.9122487Z ok (21.391s) 2023-01-11T21:48:12.9122509Z 2023-01-11T21:48:12.9122786Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9122903Z Ran 1 test in 21.391s 2023-01-11T21:48:12.9122923Z 2023-01-11T21:48:12.9123017Z OK 2023-01-11T21:48:12.9123036Z 2023-01-11T21:48:12.9123143Z Generating XML reports... 2023-01-11T21:48:12.9123593Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214149.xml 2023-01-11T21:48:12.9123969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9124149Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9124527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9124720Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9124744Z 2023-01-11T21:48:12.9124855Z Running tests... 2023-01-11T21:48:12.9125122Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9125434Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9125828Z test_monitored_barrier_failure_order (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.9125847Z 2023-01-11T21:48:12.9126111Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9126228Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9126248Z 2023-01-11T21:48:12.9126357Z OK (skipped=1) 2023-01-11T21:48:12.9126376Z 2023-01-11T21:48:12.9126502Z Generating XML reports... 2023-01-11T21:48:12.9126946Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214213.xml 2023-01-11T21:48:12.9127315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9127493Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9127871Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9128046Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9128065Z 2023-01-11T21:48:12.9128177Z Running tests... 2023-01-11T21:48:12.9128444Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9128753Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9129145Z test_monitored_barrier_gloo (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.9129166Z 2023-01-11T21:48:12.9129424Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9129541Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9129561Z 2023-01-11T21:48:12.9129674Z OK (skipped=1) 2023-01-11T21:48:12.9129693Z 2023-01-11T21:48:12.9129817Z Generating XML reports... 2023-01-11T21:48:12.9130241Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214215.xml 2023-01-11T21:48:12.9130613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9130791Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9131239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9131433Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9131452Z 2023-01-11T21:48:12.9131564Z Running tests... 2023-01-11T21:48:12.9131830Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9132185Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9132598Z test_monitored_barrier_gloo_rank_0_timeout (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.9132639Z 2023-01-11T21:48:12.9133016Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9133137Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9133156Z 2023-01-11T21:48:12.9133266Z OK (skipped=1) 2023-01-11T21:48:12.9133290Z 2023-01-11T21:48:12.9133417Z Generating XML reports... 2023-01-11T21:48:12.9133865Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214218.xml 2023-01-11T21:48:12.9134233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9134412Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9134794Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9134969Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9135007Z 2023-01-11T21:48:12.9135098Z Running tests... 2023-01-11T21:48:12.9135361Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9135670Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9136085Z test_monitored_barrier_gloo_subgroup (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.9136105Z 2023-01-11T21:48:12.9136367Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9136480Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9136500Z 2023-01-11T21:48:12.9136610Z OK (skipped=1) 2023-01-11T21:48:12.9136629Z 2023-01-11T21:48:12.9136758Z Generating XML reports... 2023-01-11T21:48:12.9137181Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214220.xml 2023-01-11T21:48:12.9137552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9137729Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9138109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9138306Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9138325Z 2023-01-11T21:48:12.9138436Z Running tests... 2023-01-11T21:48:12.9138701Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9139010Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9139404Z test_monitored_barrier_wait_all_ranks (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.9139443Z 2023-01-11T21:48:12.9139685Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9139798Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9139818Z 2023-01-11T21:48:12.9139927Z OK (skipped=1) 2023-01-11T21:48:12.9139946Z 2023-01-11T21:48:12.9140070Z Generating XML reports... 2023-01-11T21:48:12.9140512Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214222.xml 2023-01-11T21:48:12.9140974Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9141153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9141531Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9141761Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9141802Z 2023-01-11T21:48:12.9141897Z Running tests... 2023-01-11T21:48:12.9142168Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9142481Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9142887Z test_nccl_backend_bool_allgather (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.002s) 2023-01-11T21:48:12.9142910Z 2023-01-11T21:48:12.9143172Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9143286Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9143306Z 2023-01-11T21:48:12.9143415Z OK (skipped=1) 2023-01-11T21:48:12.9143434Z 2023-01-11T21:48:12.9143559Z Generating XML reports... 2023-01-11T21:48:12.9143980Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214225.xml 2023-01-11T21:48:12.9144349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9144526Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9144905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9145097Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9145116Z 2023-01-11T21:48:12.9145232Z Running tests... 2023-01-11T21:48:12.9145497Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9145806Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9146208Z test_nccl_backend_bool_allreduce (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.002s) 2023-01-11T21:48:12.9146228Z 2023-01-11T21:48:12.9146474Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9146588Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9146607Z 2023-01-11T21:48:12.9146717Z OK (skipped=1) 2023-01-11T21:48:12.9146735Z 2023-01-11T21:48:12.9146862Z Generating XML reports... 2023-01-11T21:48:12.9147304Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214227.xml 2023-01-11T21:48:12.9147674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9147858Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9148236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9148426Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9148445Z 2023-01-11T21:48:12.9148536Z Running tests... 2023-01-11T21:48:12.9148803Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9149113Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9149510Z test_nccl_backend_bool_broadcast (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.002s) 2023-01-11T21:48:12.9149529Z 2023-01-11T21:48:12.9149791Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9149903Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9149978Z 2023-01-11T21:48:12.9150093Z OK (skipped=1) 2023-01-11T21:48:12.9150113Z 2023-01-11T21:48:12.9150238Z Generating XML reports... 2023-01-11T21:48:12.9150684Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214230.xml 2023-01-11T21:48:12.9151033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9151258Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9151647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9151838Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9151857Z 2023-01-11T21:48:12.9151967Z Running tests... 2023-01-11T21:48:12.9152229Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9152538Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9152939Z test_nccl_backend_bool_reduce (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.003s) 2023-01-11T21:48:12.9152959Z 2023-01-11T21:48:12.9153219Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9153313Z Ran 1 test in 0.003s 2023-01-11T21:48:12.9153332Z 2023-01-11T21:48:12.9153442Z OK (skipped=1) 2023-01-11T21:48:12.9153461Z 2023-01-11T21:48:12.9153590Z Generating XML reports... 2023-01-11T21:48:12.9154029Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214232.xml 2023-01-11T21:48:12.9154395Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9154571Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9154945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9155141Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9155160Z 2023-01-11T21:48:12.9155251Z Running tests... 2023-01-11T21:48:12.9155514Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9155824Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9156126Z test_nccl_high_priority_stream (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL backend supports high priority stream (0.002s) 2023-01-11T21:48:12.9156146Z 2023-01-11T21:48:12.9156406Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9156543Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9156563Z 2023-01-11T21:48:12.9156673Z OK (skipped=1) 2023-01-11T21:48:12.9156692Z 2023-01-11T21:48:12.9156817Z Generating XML reports... 2023-01-11T21:48:12.9157263Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214234.xml 2023-01-11T21:48:12.9157617Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9157793Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9158172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9158366Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9158385Z 2023-01-11T21:48:12.9158498Z Running tests... 2023-01-11T21:48:12.9158762Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9159071Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9159321Z test_new_subgroups (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T21:48:12.9159400Z 2023-01-11T21:48:12.9159670Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9159766Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9159785Z 2023-01-11T21:48:12.9159896Z OK (skipped=1) 2023-01-11T21:48:12.9159915Z 2023-01-11T21:48:12.9160040Z Generating XML reports... 2023-01-11T21:48:12.9160481Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214237.xml 2023-01-11T21:48:12.9160903Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9161089Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9161475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9161668Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9161687Z 2023-01-11T21:48:12.9161783Z Running tests... 2023-01-11T21:48:12.9162050Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9162359Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9162632Z test_new_subgroups_by_enumeration (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T21:48:12.9162652Z 2023-01-11T21:48:12.9162915Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9163030Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9163049Z 2023-01-11T21:48:12.9163158Z OK (skipped=1) 2023-01-11T21:48:12.9163177Z 2023-01-11T21:48:12.9163303Z Generating XML reports... 2023-01-11T21:48:12.9163742Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214239.xml 2023-01-11T21:48:12.9164091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9164275Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9164653Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9164844Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9164864Z 2023-01-11T21:48:12.9164974Z Running tests... 2023-01-11T21:48:12.9165237Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9165548Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9165859Z test_new_subgroups_by_enumeration_input_rank_exceeds_world_size (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T21:48:12.9165879Z 2023-01-11T21:48:12.9166143Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9166236Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9166259Z 2023-01-11T21:48:12.9166369Z OK (skipped=1) 2023-01-11T21:48:12.9166388Z 2023-01-11T21:48:12.9166513Z Generating XML reports... 2023-01-11T21:48:12.9166960Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214241.xml 2023-01-11T21:48:12.9167327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9167508Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9167889Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9168080Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9168100Z 2023-01-11T21:48:12.9168210Z Running tests... 2023-01-11T21:48:12.9168455Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9168765Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9169138Z test_new_subgroups_by_enumeration_negative_input_rank (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9169361Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16426 2023-01-11T21:48:12.9169582Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16427 2023-01-11T21:48:12.9170006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9170191Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9170570Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9170742Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9171103Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9171284Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9171656Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9171844Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9172097Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9172345Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9172743Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9173290Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9173505Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9173739Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9173845Z ok (4.229s) 2023-01-11T21:48:12.9173865Z 2023-01-11T21:48:12.9174134Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9174247Z Ran 1 test in 4.229s 2023-01-11T21:48:12.9174267Z 2023-01-11T21:48:12.9174362Z OK 2023-01-11T21:48:12.9174381Z 2023-01-11T21:48:12.9174511Z Generating XML reports... 2023-01-11T21:48:12.9174956Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214244.xml 2023-01-11T21:48:12.9175304Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9175484Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9175862Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9176061Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9176081Z 2023-01-11T21:48:12.9176191Z Running tests... 2023-01-11T21:48:12.9176456Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9176765Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9177063Z test_new_subgroups_group_size_exceeds_world_size (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9177284Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16529 2023-01-11T21:48:12.9177484Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16530 2023-01-11T21:48:12.9177851Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9178029Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9178498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9178693Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9179054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9179288Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9179674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9179844Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9180092Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9180336Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9180738Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9181132Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9181365Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9181598Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9181703Z ok (4.326s) 2023-01-11T21:48:12.9181722Z 2023-01-11T21:48:12.9181990Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9182084Z Ran 1 test in 4.326s 2023-01-11T21:48:12.9182103Z 2023-01-11T21:48:12.9182199Z OK 2023-01-11T21:48:12.9182218Z 2023-01-11T21:48:12.9182344Z Generating XML reports... 2023-01-11T21:48:12.9182792Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214251.xml 2023-01-11T21:48:12.9183168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9183346Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9183724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9183921Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9183940Z 2023-01-11T21:48:12.9184052Z Running tests... 2023-01-11T21:48:12.9184295Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9184608Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9184885Z test_new_subgroups_overlap_not_allowed (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T21:48:12.9184908Z 2023-01-11T21:48:12.9185171Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9185285Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9185304Z 2023-01-11T21:48:12.9185412Z OK (skipped=1) 2023-01-11T21:48:12.9185431Z 2023-01-11T21:48:12.9185555Z Generating XML reports... 2023-01-11T21:48:12.9185996Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214258.xml 2023-01-11T21:48:12.9186367Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9186527Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9186905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9187097Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9187116Z 2023-01-11T21:48:12.9187288Z Running tests... 2023-01-11T21:48:12.9187555Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9187866Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9188167Z test_new_subgroups_world_size_not_divisible_by_group_size (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T21:48:12.9188187Z 2023-01-11T21:48:12.9188491Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9188592Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9188630Z 2023-01-11T21:48:12.9188722Z OK (skipped=1) 2023-01-11T21:48:12.9188741Z 2023-01-11T21:48:12.9188868Z Generating XML reports... 2023-01-11T21:48:12.9189321Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214300.xml 2023-01-11T21:48:12.9189692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9189873Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9190251Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9190443Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9190462Z 2023-01-11T21:48:12.9190574Z Running tests... 2023-01-11T21:48:12.9190822Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9191133Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9191411Z test_output_unused_in_loss_dict_module (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9192156Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78112 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.618s) 2023-01-11T21:48:12.9192180Z 2023-01-11T21:48:12.9192443Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9192558Z Ran 1 test in 1.618s 2023-01-11T21:48:12.9192577Z 2023-01-11T21:48:12.9192687Z OK (skipped=1) 2023-01-11T21:48:12.9192706Z 2023-01-11T21:48:12.9192831Z Generating XML reports... 2023-01-11T21:48:12.9193278Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214302.xml 2023-01-11T21:48:12.9193647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9193808Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9194188Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9194384Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9194403Z 2023-01-11T21:48:12.9194514Z Running tests... 2023-01-11T21:48:12.9194782Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9195093Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9195377Z test_output_unused_in_loss_tuple_module (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9195599Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16732 2023-01-11T21:48:12.9195799Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16733 2023-01-11T21:48:12.9196170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9196346Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9196726Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9196980Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9197352Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9197527Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9197940Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9198135Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9198365Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9198613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9199018Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9199415Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9199647Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9199877Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9200161Z [1673473391.567889] [7e0e28e30a97:16733:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9200396Z [1673473391.573322] [7e0e28e30a97:16733:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9200638Z [1673473391.573322] [7e0e28e30a97:16733:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9200891Z [1673473391.560836] [7e0e28e30a97:16732:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9201129Z [1673473391.566240] [7e0e28e30a97:16732:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9201369Z [1673473391.566240] [7e0e28e30a97:16732:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9201480Z ok (5.938s) 2023-01-11T21:48:12.9201499Z 2023-01-11T21:48:12.9201770Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9201884Z Ran 1 test in 5.938s 2023-01-11T21:48:12.9201904Z 2023-01-11T21:48:12.9201999Z OK 2023-01-11T21:48:12.9202018Z 2023-01-11T21:48:12.9202144Z Generating XML reports... 2023-01-11T21:48:12.9202591Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214306.xml 2023-01-11T21:48:12.9202946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9203124Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9203503Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9203695Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9203714Z 2023-01-11T21:48:12.9203828Z Running tests... 2023-01-11T21:48:12.9204092Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9204403Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9204675Z test_periodic_model_averager (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9204896Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16850 2023-01-11T21:48:12.9205159Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16851 2023-01-11T21:48:12.9205537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9205715Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9206092Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9206329Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9206709Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9206885Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9207263Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9207434Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9207684Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9207930Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9208330Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9208731Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9208963Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9209192Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9209470Z [1673473400.414562] [7e0e28e30a97:16850:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9209747Z [1673473400.417443] [7e0e28e30a97:16851:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9209978Z [1673473400.421855] [7e0e28e30a97:16850:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9210203Z [1673473400.421855] [7e0e28e30a97:16850:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9210436Z [1673473400.422347] [7e0e28e30a97:16851:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9210676Z [1673473400.422347] [7e0e28e30a97:16851:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9210787Z ok (5.942s) 2023-01-11T21:48:12.9210807Z 2023-01-11T21:48:12.9211077Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9211196Z Ran 1 test in 5.942s 2023-01-11T21:48:12.9211216Z 2023-01-11T21:48:12.9211311Z OK 2023-01-11T21:48:12.9211329Z 2023-01-11T21:48:12.9211455Z Generating XML reports... 2023-01-11T21:48:12.9211900Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214315.xml 2023-01-11T21:48:12.9212253Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9212435Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9212814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9213159Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9213181Z 2023-01-11T21:48:12.9213294Z Running tests... 2023-01-11T21:48:12.9213564Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9214003Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9214294Z test_periodic_model_averager_param_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9214495Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16965 2023-01-11T21:48:12.9214715Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16966 2023-01-11T21:48:12.9215144Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9215330Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9215717Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9215912Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9216274Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9216456Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9216829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9217001Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9217250Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9217497Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9217900Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9218296Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9218529Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9218761Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9219040Z [1673473408.912075] [7e0e28e30a97:16965:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9219315Z [1673473408.918650] [7e0e28e30a97:16966:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9219530Z [1673473408.918951] [7e0e28e30a97:16965:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9219769Z [1673473408.918951] [7e0e28e30a97:16965:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9219997Z [1673473408.923733] [7e0e28e30a97:16966:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9220235Z [1673473408.923733] [7e0e28e30a97:16966:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9220341Z ok (5.927s) 2023-01-11T21:48:12.9220362Z 2023-01-11T21:48:12.9220631Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9220746Z Ran 1 test in 5.928s 2023-01-11T21:48:12.9220765Z 2023-01-11T21:48:12.9220861Z OK 2023-01-11T21:48:12.9220880Z 2023-01-11T21:48:12.9221009Z Generating XML reports... 2023-01-11T21:48:12.9221442Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214323.xml 2023-01-11T21:48:12.9221813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9221993Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9222374Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9222627Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9222647Z 2023-01-11T21:48:12.9222759Z Running tests... 2023-01-11T21:48:12.9223032Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9223343Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9223671Z test_post_localSGD_optimizer_parity (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9224410Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77123 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.633s) 2023-01-11T21:48:12.9224450Z 2023-01-11T21:48:12.9224695Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9224816Z Ran 1 test in 1.634s 2023-01-11T21:48:12.9224835Z 2023-01-11T21:48:12.9224946Z OK (skipped=1) 2023-01-11T21:48:12.9224965Z 2023-01-11T21:48:12.9225092Z Generating XML reports... 2023-01-11T21:48:12.9225538Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214332.xml 2023-01-11T21:48:12.9225912Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9226091Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9226473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9226665Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9226685Z 2023-01-11T21:48:12.9226775Z Running tests... 2023-01-11T21:48:12.9227047Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9227361Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9227655Z test_post_localSGD_optimizer_parity_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9228399Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77292 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.610s) 2023-01-11T21:48:12.9228420Z 2023-01-11T21:48:12.9228679Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9228793Z Ran 1 test in 1.610s 2023-01-11T21:48:12.9228812Z 2023-01-11T21:48:12.9228922Z OK (skipped=1) 2023-01-11T21:48:12.9228941Z 2023-01-11T21:48:12.9229069Z Generating XML reports... 2023-01-11T21:48:12.9229497Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214336.xml 2023-01-11T21:48:12.9229908Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9230087Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9230472Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9230666Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9230686Z 2023-01-11T21:48:12.9230797Z Running tests... 2023-01-11T21:48:12.9231061Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9231370Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9231678Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9231954Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17148 2023-01-11T21:48:12.9232175Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17149 2023-01-11T21:48:12.9232554Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9232777Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9233170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9233362Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9233726Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9233901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9234257Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9234447Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9234693Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9234939Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9235344Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9235744Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9235976Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9236203Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9236359Z skip: Need at least 4 CUDA devices (4.211s) 2023-01-11T21:48:12.9236380Z 2023-01-11T21:48:12.9236628Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9236744Z Ran 1 test in 4.211s 2023-01-11T21:48:12.9236763Z 2023-01-11T21:48:12.9236875Z OK (skipped=1) 2023-01-11T21:48:12.9236894Z 2023-01-11T21:48:12.9237021Z Generating XML reports... 2023-01-11T21:48:12.9237470Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214340.xml 2023-01-11T21:48:12.9237842Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9238021Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9238401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9238597Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9238619Z 2023-01-11T21:48:12.9238712Z Running tests... 2023-01-11T21:48:12.9238977Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9239294Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9239625Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9239848Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17251 2023-01-11T21:48:12.9240066Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17252 2023-01-11T21:48:12.9240438Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9240615Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9240975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9241227Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9241597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9241772Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9242194Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9242393Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9242641Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9242886Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9243292Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9243671Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9243903Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9244135Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9244291Z skip: Need at least 4 CUDA devices (4.219s) 2023-01-11T21:48:12.9244312Z 2023-01-11T21:48:12.9244580Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9244696Z Ran 1 test in 4.219s 2023-01-11T21:48:12.9244715Z 2023-01-11T21:48:12.9244825Z OK (skipped=1) 2023-01-11T21:48:12.9244844Z 2023-01-11T21:48:12.9244971Z Generating XML reports... 2023-01-11T21:48:12.9245417Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214347.xml 2023-01-11T21:48:12.9245769Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9245948Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9246326Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9246524Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9246543Z 2023-01-11T21:48:12.9246656Z Running tests... 2023-01-11T21:48:12.9246920Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9247231Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9247516Z test_post_localSGD_optimizer_step_reload (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9248254Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/84886 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.611s) 2023-01-11T21:48:12.9248278Z 2023-01-11T21:48:12.9248541Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9248637Z Ran 1 test in 1.612s 2023-01-11T21:48:12.9248656Z 2023-01-11T21:48:12.9248770Z OK (skipped=1) 2023-01-11T21:48:12.9248790Z 2023-01-11T21:48:12.9248919Z Generating XML reports... 2023-01-11T21:48:12.9249364Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214354.xml 2023-01-11T21:48:12.9249734Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9249911Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9250361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9250553Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9250573Z 2023-01-11T21:48:12.9250663Z Running tests... 2023-01-11T21:48:12.9250927Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9251283Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9251558Z test_reduce_full_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9251781Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17388 2023-01-11T21:48:12.9252000Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17389 2023-01-11T21:48:12.9252376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9252559Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9253094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9253276Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9253646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9253828Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9254203Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9254397Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9254645Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9254889Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9255292Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9255666Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9255898Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9256143Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.9256372Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9256608Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.9257027Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9257426Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9257762Z STAGE:2023-01-11 21:44:02 17388:17388 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9258084Z STAGE:2023-01-11 21:44:02 17389:17389 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9258348Z [1673473442.104097] [7e0e28e30a97:17389:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9258586Z [1673473443.173695] [7e0e28e30a97:17389:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9258828Z [1673473443.173695] [7e0e28e30a97:17389:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9259103Z [1673473442.104073] [7e0e28e30a97:17388:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9259427Z [1673473443.150092] [7e0e28e30a97:17388:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9259665Z [1673473443.150092] [7e0e28e30a97:17388:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9260278Z STAGE:2023-01-11 21:44:03 17389:17389 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:44:03 17388:17388 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9260302Z 2023-01-11T21:48:12.9260887Z STAGE:2023-01-11 21:44:03 17388:17388 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:44:03 17389:17389 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9260908Z 2023-01-11T21:48:12.9261234Z STAGE:2023-01-11 21:44:03 17389:17389 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9261559Z STAGE:2023-01-11 21:44:03 17388:17388 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9261895Z STAGE:2023-01-11 21:44:03 17389:17389 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9262448Z STAGE:2023-01-11 21:44:03 17388:17388 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:44:03 17389:17389 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9262468Z 2023-01-11T21:48:12.9262816Z STAGE:2023-01-11 21:44:03 17388:17388 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9262902Z ok (5.839s) 2023-01-11T21:48:12.9262921Z 2023-01-11T21:48:12.9263191Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9263307Z Ran 1 test in 5.839s 2023-01-11T21:48:12.9263326Z 2023-01-11T21:48:12.9263420Z OK 2023-01-11T21:48:12.9263439Z 2023-01-11T21:48:12.9263570Z Generating XML reports... 2023-01-11T21:48:12.9264019Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214358.xml 2023-01-11T21:48:12.9264389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9264570Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9264934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9265129Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9265148Z 2023-01-11T21:48:12.9265261Z Running tests... 2023-01-11T21:48:12.9265527Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9265840Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9266107Z test_reduce_full_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9266335Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17502 2023-01-11T21:48:12.9266553Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17503 2023-01-11T21:48:12.9266923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9267084Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9267469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9267663Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9268029Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9268207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9268654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9268845Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9269091Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9269362Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9269776Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9270171Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9270403Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9270762Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.9270993Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9271230Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.9271724Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9272072Z STAGE:2023-01-11 21:44:10 17503:17503 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9272446Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9272776Z STAGE:2023-01-11 21:44:10 17502:17502 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9273057Z [1673473450.564474] [7e0e28e30a97:17503:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9273453Z [1673473451.585270] [7e0e28e30a97:17503:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9273710Z [1673473451.585270] [7e0e28e30a97:17503:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9273988Z [1673473450.543478] [7e0e28e30a97:17502:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9274222Z [1673473451.585630] [7e0e28e30a97:17502:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9274463Z [1673473451.585630] [7e0e28e30a97:17502:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9275024Z STAGE:2023-01-11 21:44:11 17503:17503 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:44:11 17502:17502 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9275048Z 2023-01-11T21:48:12.9275617Z STAGE:2023-01-11 21:44:11 17503:17503 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:44:11 17502:17502 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9275638Z 2023-01-11T21:48:12.9275970Z STAGE:2023-01-11 21:44:11 17502:17502 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9276296Z STAGE:2023-01-11 21:44:11 17503:17503 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9276611Z STAGE:2023-01-11 21:44:11 17502:17502 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9276940Z STAGE:2023-01-11 21:44:11 17503:17503 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9277286Z STAGE:2023-01-11 21:44:11 17502:17502 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9277712Z STAGE:2023-01-11 21:44:11 17503:17503 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9277817Z ok (5.833s) 2023-01-11T21:48:12.9277837Z 2023-01-11T21:48:12.9278104Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9278220Z Ran 1 test in 5.834s 2023-01-11T21:48:12.9278239Z 2023-01-11T21:48:12.9278335Z OK 2023-01-11T21:48:12.9278354Z 2023-01-11T21:48:12.9278481Z Generating XML reports... 2023-01-11T21:48:12.9278962Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214406.xml 2023-01-11T21:48:12.9279348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9279529Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9279910Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9280109Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9280129Z 2023-01-11T21:48:12.9280241Z Running tests... 2023-01-11T21:48:12.9280510Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9280827Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9281084Z test_reduce_full_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9281309Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17616 2023-01-11T21:48:12.9281528Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17617 2023-01-11T21:48:12.9281900Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9282077Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9282462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9282657Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9283026Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9283203Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9283567Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9283762Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9284009Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9284254Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9284657Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9285057Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9285288Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9285537Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.9285744Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9285982Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.9286377Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9286767Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9287173Z STAGE:2023-01-11 21:44:18 17617:17617 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9287495Z STAGE:2023-01-11 21:44:18 17616:17616 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9287775Z [1673473458.967222] [7e0e28e30a97:17617:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9288056Z [1673473460.032894] [7e0e28e30a97:17617:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9288305Z [1673473460.032894] [7e0e28e30a97:17617:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9288580Z [1673473458.945494] [7e0e28e30a97:17616:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9288795Z [1673473460.016214] [7e0e28e30a97:17616:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9289033Z [1673473460.016214] [7e0e28e30a97:17616:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9289591Z STAGE:2023-01-11 21:44:20 17617:17617 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:44:20 17616:17616 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9289612Z 2023-01-11T21:48:12.9289962Z STAGE:2023-01-11 21:44:20 17617:17617 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9290309Z STAGE:2023-01-11 21:44:20 17616:17616 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9290638Z STAGE:2023-01-11 21:44:20 17617:17617 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9290959Z STAGE:2023-01-11 21:44:20 17616:17616 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9291294Z STAGE:2023-01-11 21:44:20 17617:17617 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9291620Z STAGE:2023-01-11 21:44:20 17616:17616 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9291966Z STAGE:2023-01-11 21:44:20 17617:17617 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9292292Z STAGE:2023-01-11 21:44:20 17616:17616 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9292399Z ok (5.938s) 2023-01-11T21:48:12.9292419Z 2023-01-11T21:48:12.9292684Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9292799Z Ran 1 test in 5.938s 2023-01-11T21:48:12.9292819Z 2023-01-11T21:48:12.9293055Z OK 2023-01-11T21:48:12.9293075Z 2023-01-11T21:48:12.9293208Z Generating XML reports... 2023-01-11T21:48:12.9293665Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214414.xml 2023-01-11T21:48:12.9294043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9294223Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9294585Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9294785Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9294804Z 2023-01-11T21:48:12.9294917Z Running tests... 2023-01-11T21:48:12.9295184Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9295499Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9295765Z test_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9296096Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17730 2023-01-11T21:48:12.9296317Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17731 2023-01-11T21:48:12.9296673Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9296853Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9297301Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9297504Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9297880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9298061Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9298442Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9298639Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9298886Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9299111Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9299519Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9299918Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9300149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9300397Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.9300624Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9300870Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.9301267Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9301663Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9301980Z STAGE:2023-01-11 21:44:27 17731:17731 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9302304Z STAGE:2023-01-11 21:44:27 17730:17730 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9302585Z [1673473467.399271] [7e0e28e30a97:17731:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9302822Z [1673473468.432681] [7e0e28e30a97:17731:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9303067Z [1673473468.432681] [7e0e28e30a97:17731:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9303412Z STAGE:2023-01-11 21:44:28 17731:17731 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9303690Z [1673473467.378736] [7e0e28e30a97:17730:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9303922Z [1673473468.422891] [7e0e28e30a97:17730:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9304162Z [1673473468.422891] [7e0e28e30a97:17730:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9304501Z STAGE:2023-01-11 21:44:28 17730:17730 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9304903Z STAGE:2023-01-11 21:44:28 17731:17731 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9305252Z STAGE:2023-01-11 21:44:28 17730:17730 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9305581Z STAGE:2023-01-11 21:44:28 17731:17731 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9305955Z STAGE:2023-01-11 21:44:28 17730:17730 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9306303Z STAGE:2023-01-11 21:44:28 17731:17731 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9306649Z STAGE:2023-01-11 21:44:28 17731:17731 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9306982Z STAGE:2023-01-11 21:44:28 17730:17730 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9307325Z STAGE:2023-01-11 21:44:28 17730:17730 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9307436Z ok (5.824s) 2023-01-11T21:48:12.9307456Z 2023-01-11T21:48:12.9307702Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9307818Z Ran 1 test in 5.824s 2023-01-11T21:48:12.9307837Z 2023-01-11T21:48:12.9307933Z OK 2023-01-11T21:48:12.9307952Z 2023-01-11T21:48:12.9308080Z Generating XML reports... 2023-01-11T21:48:12.9308535Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214423.xml 2023-01-11T21:48:12.9308913Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9309092Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9309475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9309649Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9309692Z 2023-01-11T21:48:12.9309787Z Running tests... 2023-01-11T21:48:12.9310055Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9310369Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9310632Z test_reduce_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9310856Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17844 2023-01-11T21:48:12.9311077Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17845 2023-01-11T21:48:12.9311452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9311630Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9311991Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9312189Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9312556Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9312733Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9313117Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9313310Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9313554Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9313798Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9314179Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9314650Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9314883Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9315113Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9315274Z skip: Skipped due to small world size. (4.227s) 2023-01-11T21:48:12.9315341Z 2023-01-11T21:48:12.9315624Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9315741Z Ran 1 test in 4.228s 2023-01-11T21:48:12.9315761Z 2023-01-11T21:48:12.9315872Z OK (skipped=1) 2023-01-11T21:48:12.9315892Z 2023-01-11T21:48:12.9316020Z Generating XML reports... 2023-01-11T21:48:12.9316450Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214431.xml 2023-01-11T21:48:12.9316823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9317005Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9317387Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9317580Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9317599Z 2023-01-11T21:48:12.9317715Z Running tests... 2023-01-11T21:48:12.9317984Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9318301Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9318557Z test_reduce_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9318758Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17947 2023-01-11T21:48:12.9318977Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17948 2023-01-11T21:48:12.9319354Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9319531Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9319912Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9320111Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9320481Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9320659Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9321018Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9321211Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9321462Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9321707Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9322108Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9322510Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9322744Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9322972Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9323132Z skip: Skipped due to small world size. (4.221s) 2023-01-11T21:48:12.9323153Z 2023-01-11T21:48:12.9323402Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9323586Z Ran 1 test in 4.222s 2023-01-11T21:48:12.9323606Z 2023-01-11T21:48:12.9323717Z OK (skipped=1) 2023-01-11T21:48:12.9323736Z 2023-01-11T21:48:12.9323866Z Generating XML reports... 2023-01-11T21:48:12.9324322Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214438.xml 2023-01-11T21:48:12.9324738Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9324923Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9325308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9325501Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9325521Z 2023-01-11T21:48:12.9325612Z Running tests... 2023-01-11T21:48:12.9325879Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9326200Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9326470Z test_reduce_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9326693Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18050 2023-01-11T21:48:12.9326914Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18051 2023-01-11T21:48:12.9327292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9327469Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9327829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9328022Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9328386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9328566Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9328946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9329135Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9329386Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9329631Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9330027Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9330403Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9330640Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9330871Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9331032Z skip: Skipped due to small world size. (4.209s) 2023-01-11T21:48:12.9331051Z 2023-01-11T21:48:12.9331319Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9331434Z Ran 1 test in 4.210s 2023-01-11T21:48:12.9331454Z 2023-01-11T21:48:12.9331569Z OK (skipped=1) 2023-01-11T21:48:12.9331589Z 2023-01-11T21:48:12.9331716Z Generating XML reports... 2023-01-11T21:48:12.9332144Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214445.xml 2023-01-11T21:48:12.9332514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9332692Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9333294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9333490Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9333510Z 2023-01-11T21:48:12.9333622Z Running tests... 2023-01-11T21:48:12.9333890Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9334279Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9334550Z test_reduce_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9334753Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18153 2023-01-11T21:48:12.9334974Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18154 2023-01-11T21:48:12.9335350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9335532Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9335913Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9336106Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9336474Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9336652Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9337013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9337204Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9337451Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9337695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9338098Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9338494Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9338723Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9338957Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9339119Z skip: Skipped due to small world size. (4.259s) 2023-01-11T21:48:12.9339139Z 2023-01-11T21:48:12.9339386Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9339502Z Ran 1 test in 4.259s 2023-01-11T21:48:12.9339521Z 2023-01-11T21:48:12.9339631Z OK (skipped=1) 2023-01-11T21:48:12.9339650Z 2023-01-11T21:48:12.9339777Z Generating XML reports... 2023-01-11T21:48:12.9340231Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214452.xml 2023-01-11T21:48:12.9340605Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9340784Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9341165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9341357Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9341376Z 2023-01-11T21:48:12.9341468Z Running tests... 2023-01-11T21:48:12.9341737Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9342051Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9342299Z test_reduce_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9342607Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18256 2023-01-11T21:48:12.9342827Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18257 2023-01-11T21:48:12.9343204Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9343382Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9343789Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9343989Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9344364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9344544Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9344923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9345115Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9345362Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9345607Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9346007Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9346382Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9346611Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9346950Z STAGE:2023-01-11 21:45:02 18256:18256 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9347187Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9347522Z STAGE:2023-01-11 21:45:02 18257:18257 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9347802Z [1673473502.923998] [7e0e28e30a97:18257:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9348042Z [1673473503.936772] [7e0e28e30a97:18257:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9348286Z [1673473503.936772] [7e0e28e30a97:18257:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9348560Z [1673473502.903587] [7e0e28e30a97:18256:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9348788Z [1673473503.963515] [7e0e28e30a97:18256:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9349012Z [1673473503.963515] [7e0e28e30a97:18256:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9349569Z STAGE:2023-01-11 21:45:04 18257:18257 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:45:04 18256:18256 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9349594Z 2023-01-11T21:48:12.9349949Z STAGE:2023-01-11 21:45:04 18256:18256 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9350298Z STAGE:2023-01-11 21:45:04 18257:18257 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9350628Z STAGE:2023-01-11 21:45:04 18257:18257 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9350954Z STAGE:2023-01-11 21:45:04 18256:18256 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9351357Z STAGE:2023-01-11 21:45:04 18257:18257 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9351686Z STAGE:2023-01-11 21:45:04 18256:18256 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9352032Z STAGE:2023-01-11 21:45:04 18257:18257 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9352449Z STAGE:2023-01-11 21:45:04 18256:18256 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9352542Z ok (5.954s) 2023-01-11T21:48:12.9352561Z 2023-01-11T21:48:12.9352835Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9352955Z Ran 1 test in 5.954s 2023-01-11T21:48:12.9352974Z 2023-01-11T21:48:12.9353070Z OK 2023-01-11T21:48:12.9353089Z 2023-01-11T21:48:12.9353220Z Generating XML reports... 2023-01-11T21:48:12.9353674Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214458.xml 2023-01-11T21:48:12.9354050Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9354229Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9354592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9354788Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9354808Z 2023-01-11T21:48:12.9354920Z Running tests... 2023-01-11T21:48:12.9355189Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9355504Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9355755Z test_reduce_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9355977Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18370 2023-01-11T21:48:12.9356200Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18371 2023-01-11T21:48:12.9356551Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9356730Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9357112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9357307Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9357696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9357873Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9358255Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9358450Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9358694Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9358919Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9359322Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9359720Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9359954Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9360290Z STAGE:2023-01-11 21:45:11 18371:18371 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9360519Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9360939Z STAGE:2023-01-11 21:45:11 18370:18370 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9361219Z [1673473511.389658] [7e0e28e30a97:18371:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9361455Z [1673473512.429125] [7e0e28e30a97:18371:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9361723Z [1673473512.429125] [7e0e28e30a97:18371:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9362006Z [1673473511.369383] [7e0e28e30a97:18370:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9362238Z [1673473512.425428] [7e0e28e30a97:18370:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9362478Z [1673473512.425428] [7e0e28e30a97:18370:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9363035Z STAGE:2023-01-11 21:45:12 18371:18371 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:45:12 18370:18370 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9363057Z 2023-01-11T21:48:12.9363630Z STAGE:2023-01-11 21:45:12 18370:18370 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:45:12 18371:18371 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9363651Z 2023-01-11T21:48:12.9363978Z STAGE:2023-01-11 21:45:12 18371:18371 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9364302Z STAGE:2023-01-11 21:45:12 18370:18370 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9364636Z STAGE:2023-01-11 21:45:12 18371:18371 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9364967Z STAGE:2023-01-11 21:45:12 18370:18370 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9365301Z STAGE:2023-01-11 21:45:12 18371:18371 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9365638Z STAGE:2023-01-11 21:45:12 18370:18370 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9365725Z ok (5.840s) 2023-01-11T21:48:12.9365748Z 2023-01-11T21:48:12.9366009Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9366113Z Ran 1 test in 5.841s 2023-01-11T21:48:12.9366133Z 2023-01-11T21:48:12.9366217Z OK 2023-01-11T21:48:12.9366236Z 2023-01-11T21:48:12.9366351Z Generating XML reports... 2023-01-11T21:48:12.9366787Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214507.xml 2023-01-11T21:48:12.9367157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9367341Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9367702Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9367898Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9367917Z 2023-01-11T21:48:12.9368028Z Running tests... 2023-01-11T21:48:12.9368299Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9368615Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9368899Z test_reduce_multigpu (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl backend supports reduce multigpu (0.002s) 2023-01-11T21:48:12.9368919Z 2023-01-11T21:48:12.9369182Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9369359Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9369378Z 2023-01-11T21:48:12.9369492Z OK (skipped=1) 2023-01-11T21:48:12.9369512Z 2023-01-11T21:48:12.9369619Z Generating XML reports... 2023-01-11T21:48:12.9370075Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214515.xml 2023-01-11T21:48:12.9370445Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9370669Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9371066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9371260Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9371280Z 2023-01-11T21:48:12.9371391Z Running tests... 2023-01-11T21:48:12.9371656Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9371954Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9372212Z test_reduce_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9372437Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18517 2023-01-11T21:48:12.9372657Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18518 2023-01-11T21:48:12.9373179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9373363Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9373744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9373938Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9374307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9374469Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9374840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9375029Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9375279Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9375524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9375925Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9376323Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9376556Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9376896Z STAGE:2023-01-11 21:45:22 18518:18518 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9377102Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9377427Z STAGE:2023-01-11 21:45:22 18517:18517 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9377711Z [1673473522.229248] [7e0e28e30a97:18518:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9377947Z [1673473523.250473] [7e0e28e30a97:18518:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9378190Z [1673473523.250473] [7e0e28e30a97:18518:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9378462Z [1673473522.207505] [7e0e28e30a97:18517:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9378785Z [1673473523.260771] [7e0e28e30a97:18517:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9379024Z [1673473523.260771] [7e0e28e30a97:18517:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9379641Z STAGE:2023-01-11 21:45:23 18518:18518 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:45:23 18517:18517 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9379665Z 2023-01-11T21:48:12.9380246Z STAGE:2023-01-11 21:45:23 18518:18518 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:45:23 18517:18517 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9380267Z 2023-01-11T21:48:12.9380596Z STAGE:2023-01-11 21:45:23 18518:18518 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9380907Z STAGE:2023-01-11 21:45:23 18517:18517 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9381242Z STAGE:2023-01-11 21:45:23 18518:18518 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9381797Z STAGE:2023-01-11 21:45:23 18518:18518 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:45:23 18517:18517 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9381817Z 2023-01-11T21:48:12.9382166Z STAGE:2023-01-11 21:45:23 18517:18517 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9382273Z ok (5.926s) 2023-01-11T21:48:12.9382293Z 2023-01-11T21:48:12.9382561Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9382677Z Ran 1 test in 5.926s 2023-01-11T21:48:12.9382697Z 2023-01-11T21:48:12.9382793Z OK 2023-01-11T21:48:12.9382816Z 2023-01-11T21:48:12.9382944Z Generating XML reports... 2023-01-11T21:48:12.9383373Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214518.xml 2023-01-11T21:48:12.9383742Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9383926Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9384312Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9384508Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9384528Z 2023-01-11T21:48:12.9384638Z Running tests... 2023-01-11T21:48:12.9384906Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9385219Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9385518Z test_reduce_scatter_tensor_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA reduce_scatter_tensor (0.002s) 2023-01-11T21:48:12.9385538Z 2023-01-11T21:48:12.9385801Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9385898Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9385917Z 2023-01-11T21:48:12.9386028Z OK (skipped=1) 2023-01-11T21:48:12.9386047Z 2023-01-11T21:48:12.9386173Z Generating XML reports... 2023-01-11T21:48:12.9386622Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214526.xml 2023-01-11T21:48:12.9386995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9387172Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9387554Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9387807Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9387827Z 2023-01-11T21:48:12.9387919Z Running tests... 2023-01-11T21:48:12.9388189Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9388504Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9388833Z test_reduce_scatter_v_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports reduce_scatter_v (0.003s) 2023-01-11T21:48:12.9388854Z 2023-01-11T21:48:12.9389127Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9389243Z Ran 1 test in 0.003s 2023-01-11T21:48:12.9389262Z 2023-01-11T21:48:12.9389373Z OK (skipped=1) 2023-01-11T21:48:12.9389392Z 2023-01-11T21:48:12.9389520Z Generating XML reports... 2023-01-11T21:48:12.9389967Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214529.xml 2023-01-11T21:48:12.9390323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9390501Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9390881Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9391080Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9391099Z 2023-01-11T21:48:12.9391210Z Running tests... 2023-01-11T21:48:12.9391476Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9391787Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9392036Z test_reduce_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9392238Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18697 2023-01-11T21:48:12.9392462Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18698 2023-01-11T21:48:12.9392833Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9393012Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9393397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9393593Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9393954Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9394130Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9394505Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9394681Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9394931Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9395177Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9395579Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9395982Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9396214Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9396551Z STAGE:2023-01-11 21:45:35 18697:18697 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9396780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9397183Z STAGE:2023-01-11 21:45:35 18698:18698 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9397446Z [1673473535.502430] [7e0e28e30a97:18698:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9397682Z [1673473536.511659] [7e0e28e30a97:18698:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9397969Z [1673473536.511659] [7e0e28e30a97:18698:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9398250Z [1673473535.480527] [7e0e28e30a97:18697:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9398485Z [1673473536.504693] [7e0e28e30a97:18697:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9398722Z [1673473536.504693] [7e0e28e30a97:18697:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9399285Z STAGE:2023-01-11 21:45:36 18698:18698 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:45:36 18697:18697 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9399306Z 2023-01-11T21:48:12.9399877Z STAGE:2023-01-11 21:45:36 18698:18698 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:45:36 18697:18697 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9399897Z 2023-01-11T21:48:12.9400230Z STAGE:2023-01-11 21:45:36 18698:18698 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9400552Z STAGE:2023-01-11 21:45:36 18697:18697 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9400886Z STAGE:2023-01-11 21:45:36 18698:18698 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9401197Z STAGE:2023-01-11 21:45:36 18697:18697 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9401547Z STAGE:2023-01-11 21:45:36 18698:18698 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9401893Z STAGE:2023-01-11 21:45:36 18697:18697 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9401999Z ok (5.949s) 2023-01-11T21:48:12.9402019Z 2023-01-11T21:48:12.9402290Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9402408Z Ran 1 test in 5.949s 2023-01-11T21:48:12.9402428Z 2023-01-11T21:48:12.9402524Z OK 2023-01-11T21:48:12.9402543Z 2023-01-11T21:48:12.9402669Z Generating XML reports... 2023-01-11T21:48:12.9403117Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214531.xml 2023-01-11T21:48:12.9403470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9403652Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9404036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9404232Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9404252Z 2023-01-11T21:48:12.9404364Z Running tests... 2023-01-11T21:48:12.9404634Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9404949Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9405211Z test_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA reduce (0.002s) 2023-01-11T21:48:12.9405230Z 2023-01-11T21:48:12.9405492Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9405588Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9405664Z 2023-01-11T21:48:12.9405782Z OK (skipped=1) 2023-01-11T21:48:12.9405801Z 2023-01-11T21:48:12.9405928Z Generating XML reports... 2023-01-11T21:48:12.9406375Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214540.xml 2023-01-11T21:48:12.9406746Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9406971Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9407362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9407555Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9407575Z 2023-01-11T21:48:12.9407666Z Running tests... 2023-01-11T21:48:12.9407930Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9408244Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9408518Z test_reduce_sum_cuda_twice (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA reduce (0.002s) 2023-01-11T21:48:12.9408538Z 2023-01-11T21:48:12.9408801Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9408916Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9408936Z 2023-01-11T21:48:12.9409045Z OK (skipped=1) 2023-01-11T21:48:12.9409064Z 2023-01-11T21:48:12.9409193Z Generating XML reports... 2023-01-11T21:48:12.9409634Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214542.xml 2023-01-11T21:48:12.9409982Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9410160Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9410535Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9410732Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9410752Z 2023-01-11T21:48:12.9410862Z Running tests... 2023-01-11T21:48:12.9411124Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9411435Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9411697Z test_reduce_sum_twice (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9411898Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18877 2023-01-11T21:48:12.9412118Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18878 2023-01-11T21:48:12.9412491Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9412670Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9413201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9413398Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9413768Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9413948Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9414325Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9414497Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9414745Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9414991Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9415488Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9415886Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9416119Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9416514Z STAGE:2023-01-11 21:45:48 18878:18878 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9416756Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9417092Z STAGE:2023-01-11 21:45:48 18877:18877 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9417351Z [1673473548.769531] [7e0e28e30a97:18878:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9417592Z [1673473549.799578] [7e0e28e30a97:18878:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9417835Z [1673473549.799578] [7e0e28e30a97:18878:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9418175Z STAGE:2023-01-11 21:45:50 18878:18878 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9418528Z STAGE:2023-01-11 21:45:50 18878:18878 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9418806Z [1673473548.747683] [7e0e28e30a97:18877:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9419040Z [1673473549.783524] [7e0e28e30a97:18877:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9419278Z [1673473549.783524] [7e0e28e30a97:18877:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9419622Z STAGE:2023-01-11 21:45:50 18877:18877 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9419970Z STAGE:2023-01-11 21:45:50 18877:18877 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9420280Z STAGE:2023-01-11 21:45:50 18877:18877 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9420616Z STAGE:2023-01-11 21:45:50 18877:18877 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9420962Z STAGE:2023-01-11 21:45:50 18877:18877 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9421291Z STAGE:2023-01-11 21:45:50 18878:18878 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9421623Z STAGE:2023-01-11 21:45:50 18878:18878 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9421966Z STAGE:2023-01-11 21:45:50 18878:18878 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9422076Z ok (5.945s) 2023-01-11T21:48:12.9422096Z 2023-01-11T21:48:12.9422364Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9422461Z Ran 1 test in 5.945s 2023-01-11T21:48:12.9422500Z 2023-01-11T21:48:12.9422575Z OK 2023-01-11T21:48:12.9422594Z 2023-01-11T21:48:12.9422721Z Generating XML reports... 2023-01-11T21:48:12.9423174Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214544.xml 2023-01-11T21:48:12.9423547Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9423726Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9424105Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9424374Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9424394Z 2023-01-11T21:48:12.9424505Z Running tests... 2023-01-11T21:48:12.9424758Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9425072Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9425339Z test_scatter (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:48:12.9425411Z 2023-01-11T21:48:12.9425691Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9425810Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9425829Z 2023-01-11T21:48:12.9425942Z OK (skipped=1) 2023-01-11T21:48:12.9425961Z 2023-01-11T21:48:12.9426087Z Generating XML reports... 2023-01-11T21:48:12.9426533Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214553.xml 2023-01-11T21:48:12.9426910Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9427070Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9427450Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9427644Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9427664Z 2023-01-11T21:48:12.9427777Z Running tests... 2023-01-11T21:48:12.9428045Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9428360Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9428630Z test_scatter_checks (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:48:12.9428650Z 2023-01-11T21:48:12.9428915Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9429012Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9429054Z 2023-01-11T21:48:12.9429145Z OK (skipped=1) 2023-01-11T21:48:12.9429164Z 2023-01-11T21:48:12.9429289Z Generating XML reports... 2023-01-11T21:48:12.9429732Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214555.xml 2023-01-11T21:48:12.9430103Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9430284Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9430662Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9430856Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9430875Z 2023-01-11T21:48:12.9430986Z Running tests... 2023-01-11T21:48:12.9431230Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9431545Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9431820Z test_scatter_complex (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:48:12.9431840Z 2023-01-11T21:48:12.9432102Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9432216Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9432235Z 2023-01-11T21:48:12.9432350Z OK (skipped=1) 2023-01-11T21:48:12.9432369Z 2023-01-11T21:48:12.9432495Z Generating XML reports... 2023-01-11T21:48:12.9432936Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214558.xml 2023-01-11T21:48:12.9433304Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9433462Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9433913Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9434107Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9434127Z 2023-01-11T21:48:12.9434238Z Running tests... 2023-01-11T21:48:12.9434502Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9434860Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9435124Z test_scatter_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA gather (0.002s) 2023-01-11T21:48:12.9435144Z 2023-01-11T21:48:12.9435412Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9435527Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9435546Z 2023-01-11T21:48:12.9435638Z OK (skipped=1) 2023-01-11T21:48:12.9435657Z 2023-01-11T21:48:12.9435785Z Generating XML reports... 2023-01-11T21:48:12.9436237Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214600.xml 2023-01-11T21:48:12.9436604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9436782Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9437168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9437364Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9437384Z 2023-01-11T21:48:12.9437497Z Running tests... 2023-01-11T21:48:12.9437744Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9438057Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9438326Z test_scatter_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA gather (0.002s) 2023-01-11T21:48:12.9438349Z 2023-01-11T21:48:12.9438612Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9438727Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9438746Z 2023-01-11T21:48:12.9438857Z OK (skipped=1) 2023-01-11T21:48:12.9438876Z 2023-01-11T21:48:12.9439003Z Generating XML reports... 2023-01-11T21:48:12.9439451Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214602.xml 2023-01-11T21:48:12.9439821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9439980Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9440361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9440555Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9440578Z 2023-01-11T21:48:12.9440689Z Running tests... 2023-01-11T21:48:12.9440954Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9441267Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9441541Z test_scatter_full_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:48:12.9441561Z 2023-01-11T21:48:12.9441827Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9441943Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9441962Z 2023-01-11T21:48:12.9442053Z OK (skipped=1) 2023-01-11T21:48:12.9442072Z 2023-01-11T21:48:12.9442198Z Generating XML reports... 2023-01-11T21:48:12.9442642Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214605.xml 2023-01-11T21:48:12.9443011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9443251Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9443639Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9443833Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9443853Z 2023-01-11T21:48:12.9443964Z Running tests... 2023-01-11T21:48:12.9444263Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9444591Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9444859Z test_scatter_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T21:48:12.9444879Z 2023-01-11T21:48:12.9445143Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9445260Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9445283Z 2023-01-11T21:48:12.9445395Z OK (skipped=1) 2023-01-11T21:48:12.9445414Z 2023-01-11T21:48:12.9445540Z Generating XML reports... 2023-01-11T21:48:12.9445982Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214607.xml 2023-01-11T21:48:12.9446351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9446511Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9446890Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9447081Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9447099Z 2023-01-11T21:48:12.9447208Z Running tests... 2023-01-11T21:48:12.9447471Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9447781Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9448165Z test_scatter_object_list (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T21:48:12.9448185Z 2023-01-11T21:48:12.9448442Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9448558Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9448578Z 2023-01-11T21:48:12.9448668Z OK (skipped=1) 2023-01-11T21:48:12.9448687Z 2023-01-11T21:48:12.9448816Z Generating XML reports... 2023-01-11T21:48:12.9449260Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214609.xml 2023-01-11T21:48:12.9449630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9449806Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9450186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9450382Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9450402Z 2023-01-11T21:48:12.9450515Z Running tests... 2023-01-11T21:48:12.9450780Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9451070Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9451317Z test_send_recv (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9451540Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19255 2023-01-11T21:48:12.9451757Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19256 2023-01-11T21:48:12.9452125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9452301Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9452747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9453073Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9453424Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9453680Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9454071Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9454260Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9454511Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9454757Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9455245Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9455650Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9455882Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9456095Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9456374Z [1673473576.251392] [7e0e28e30a97:19255:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9456607Z [1673473577.033914] [7e0e28e30a97:19255:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9456849Z [1673473577.033914] [7e0e28e30a97:19255:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9457130Z [1673473576.252281] [7e0e28e30a97:19256:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9457360Z [1673473577.028097] [7e0e28e30a97:19256:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9457602Z [1673473577.028097] [7e0e28e30a97:19256:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9457709Z ok (5.447s) 2023-01-11T21:48:12.9457730Z 2023-01-11T21:48:12.9458001Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9458118Z Ran 1 test in 5.447s 2023-01-11T21:48:12.9458157Z 2023-01-11T21:48:12.9458234Z OK 2023-01-11T21:48:12.9458253Z 2023-01-11T21:48:12.9458381Z Generating XML reports... 2023-01-11T21:48:12.9458969Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214612.xml 2023-01-11T21:48:12.9459360Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9459540Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9459920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9460117Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9460138Z 2023-01-11T21:48:12.9460251Z Running tests... 2023-01-11T21:48:12.9460498Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9460810Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9461094Z test_send_recv_any_source (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support send/recv from any source (0.002s) 2023-01-11T21:48:12.9461114Z 2023-01-11T21:48:12.9461469Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9461583Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9461603Z 2023-01-11T21:48:12.9461713Z OK (skipped=1) 2023-01-11T21:48:12.9461732Z 2023-01-11T21:48:12.9461858Z Generating XML reports... 2023-01-11T21:48:12.9462305Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214620.xml 2023-01-11T21:48:12.9462726Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9462891Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9463280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9463471Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9463491Z 2023-01-11T21:48:12.9463601Z Running tests... 2023-01-11T21:48:12.9463868Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9464179Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9464491Z test_send_recv_any_source_autograd_profiler (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support send/recv from any source (0.002s) 2023-01-11T21:48:12.9464512Z 2023-01-11T21:48:12.9464775Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9464890Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9464910Z 2023-01-11T21:48:12.9465001Z OK (skipped=1) 2023-01-11T21:48:12.9465020Z 2023-01-11T21:48:12.9465145Z Generating XML reports... 2023-01-11T21:48:12.9465586Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214622.xml 2023-01-11T21:48:12.9465958Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9466139Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9466520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9466715Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9466734Z 2023-01-11T21:48:12.9466843Z Running tests... 2023-01-11T21:48:12.9467091Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9467406Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9467708Z test_send_recv_any_source_torch_profiler (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support send/recv from any source (0.002s) 2023-01-11T21:48:12.9467727Z 2023-01-11T21:48:12.9467986Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9468100Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9468123Z 2023-01-11T21:48:12.9468234Z OK (skipped=1) 2023-01-11T21:48:12.9468252Z 2023-01-11T21:48:12.9468378Z Generating XML reports... 2023-01-11T21:48:12.9468821Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214625.xml 2023-01-11T21:48:12.9469190Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9469351Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9469729Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9469921Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9469940Z 2023-01-11T21:48:12.9470049Z Running tests... 2023-01-11T21:48:12.9470313Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9470624Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9470964Z test_send_recv_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9471188Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19464 2023-01-11T21:48:12.9471406Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19465 2023-01-11T21:48:12.9471811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9471995Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9472380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9472574Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9472934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9473109Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9473481Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9473668Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9473898Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9474146Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9474546Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9474941Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9475171Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9475511Z STAGE:2023-01-11 21:46:31 19464:19464 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9475736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9476062Z STAGE:2023-01-11 21:46:31 19465:19465 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9476344Z [1673473591.446848] [7e0e28e30a97:19465:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9476580Z [1673473592.474363] [7e0e28e30a97:19465:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9476802Z [1673473592.474363] [7e0e28e30a97:19465:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9477147Z STAGE:2023-01-11 21:46:32 19465:19465 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9477500Z STAGE:2023-01-11 21:46:32 19465:19465 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9477770Z [1673473591.423927] [7e0e28e30a97:19464:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9478004Z [1673473592.447264] [7e0e28e30a97:19464:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9478242Z [1673473592.447264] [7e0e28e30a97:19464:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9478581Z STAGE:2023-01-11 21:46:32 19464:19464 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9478931Z STAGE:2023-01-11 21:46:32 19464:19464 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9479037Z ok (5.944s) 2023-01-11T21:48:12.9479056Z 2023-01-11T21:48:12.9479368Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9479485Z Ran 1 test in 5.944s 2023-01-11T21:48:12.9479504Z 2023-01-11T21:48:12.9479599Z OK 2023-01-11T21:48:12.9479618Z 2023-01-11T21:48:12.9479744Z Generating XML reports... 2023-01-11T21:48:12.9480192Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214627.xml 2023-01-11T21:48:12.9480606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9480793Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9481180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9481373Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9481392Z 2023-01-11T21:48:12.9481484Z Running tests... 2023-01-11T21:48:12.9481756Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9482071Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9482314Z test_send_recv_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Send Recv Only (0.002s) 2023-01-11T21:48:12.9482334Z 2023-01-11T21:48:12.9482593Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9482707Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9482731Z 2023-01-11T21:48:12.9482842Z OK (skipped=1) 2023-01-11T21:48:12.9482861Z 2023-01-11T21:48:12.9482987Z Generating XML reports... 2023-01-11T21:48:12.9483410Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214635.xml 2023-01-11T21:48:12.9483779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9483958Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9484342Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9484535Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9484555Z 2023-01-11T21:48:12.9484666Z Running tests... 2023-01-11T21:48:12.9484930Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9485246Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9485512Z test_send_recv_nccl_autograd_profiler (__main__.TestDistBackendWithSpawn) ... skip: NCCL Send Recv Only (0.002s) 2023-01-11T21:48:12.9485532Z 2023-01-11T21:48:12.9485773Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9485886Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9485905Z 2023-01-11T21:48:12.9486016Z OK (skipped=1) 2023-01-11T21:48:12.9486035Z 2023-01-11T21:48:12.9486162Z Generating XML reports... 2023-01-11T21:48:12.9486602Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214638.xml 2023-01-11T21:48:12.9486973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9487148Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9487528Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9487721Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9487740Z 2023-01-11T21:48:12.9487830Z Running tests... 2023-01-11T21:48:12.9488092Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9488402Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9488662Z test_send_recv_nccl_torch_profiler (__main__.TestDistBackendWithSpawn) ... skip: NCCL Send Recv Only (0.002s) 2023-01-11T21:48:12.9488747Z 2023-01-11T21:48:12.9489017Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9489133Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9489152Z 2023-01-11T21:48:12.9489261Z OK (skipped=1) 2023-01-11T21:48:12.9489280Z 2023-01-11T21:48:12.9489405Z Generating XML reports... 2023-01-11T21:48:12.9489876Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214640.xml 2023-01-11T21:48:12.9490262Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9490442Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9490821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9491012Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9491036Z 2023-01-11T21:48:12.9491145Z Running tests... 2023-01-11T21:48:12.9491406Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9491718Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9491986Z test_send_recv_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9492193Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19677 2023-01-11T21:48:12.9492412Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19678 2023-01-11T21:48:12.9492785Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9493105Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9493492Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9493690Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9494062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9494239Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9494595Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9494791Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9495038Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9495284Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9495689Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9496091Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9496323Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9496663Z STAGE:2023-01-11 21:46:46 19678:19678 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9496897Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9497211Z STAGE:2023-01-11 21:46:47 19677:19677 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9497490Z [1673473607.048204] [7e0e28e30a97:19678:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9497724Z [1673473608.094107] [7e0e28e30a97:19678:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9498097Z [1673473608.094107] [7e0e28e30a97:19678:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9498451Z STAGE:2023-01-11 21:46:48 19678:19678 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9498799Z STAGE:2023-01-11 21:46:48 19678:19678 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9499139Z [1673473607.048229] [7e0e28e30a97:19677:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9499382Z [1673473608.102236] [7e0e28e30a97:19677:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9499618Z [1673473608.102236] [7e0e28e30a97:19677:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9499967Z STAGE:2023-01-11 21:46:48 19677:19677 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9500303Z STAGE:2023-01-11 21:46:48 19677:19677 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9500410Z ok (5.858s) 2023-01-11T21:48:12.9500430Z 2023-01-11T21:48:12.9500695Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9500809Z Ran 1 test in 5.858s 2023-01-11T21:48:12.9500829Z 2023-01-11T21:48:12.9500928Z OK 2023-01-11T21:48:12.9500947Z 2023-01-11T21:48:12.9501078Z Generating XML reports... 2023-01-11T21:48:12.9501526Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214643.xml 2023-01-11T21:48:12.9501897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9502075Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9502436Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9502634Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9502654Z 2023-01-11T21:48:12.9502764Z Running tests... 2023-01-11T21:48:12.9503032Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9503346Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9503608Z test_send_recv_with_tag (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9503829Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19791 2023-01-11T21:48:12.9504047Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19792 2023-01-11T21:48:12.9504400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9504580Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9504965Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9505158Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9505524Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9505703Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9506082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9506274Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9506520Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9506746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9507216Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9507617Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9507848Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9508123Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9508411Z [1673473615.332581] [7e0e28e30a97:19792:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9508648Z [1673473616.136201] [7e0e28e30a97:19792:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9508891Z [1673473616.136201] [7e0e28e30a97:19792:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9509169Z [1673473615.331710] [7e0e28e30a97:19791:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9509400Z [1673473616.134509] [7e0e28e30a97:19791:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9509623Z [1673473616.134509] [7e0e28e30a97:19791:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9509729Z ok (5.428s) 2023-01-11T21:48:12.9509749Z 2023-01-11T21:48:12.9510028Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9510144Z Ran 1 test in 5.428s 2023-01-11T21:48:12.9510164Z 2023-01-11T21:48:12.9510259Z OK 2023-01-11T21:48:12.9510278Z 2023-01-11T21:48:12.9510405Z Generating XML reports... 2023-01-11T21:48:12.9510853Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214651.xml 2023-01-11T21:48:12.9511232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9511391Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9511773Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9511968Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9511988Z 2023-01-11T21:48:12.9512100Z Running tests... 2023-01-11T21:48:12.9512369Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9512680Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9512966Z test_send_recv_with_tag_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9513188Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19901 2023-01-11T21:48:12.9513412Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19902 2023-01-11T21:48:12.9513766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9513946Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9514331Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9514526Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9514892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9515069Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9515441Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9515699Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9515930Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9516175Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9516633Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9517042Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9517276Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9517613Z STAGE:2023-01-11 21:47:03 19901:19901 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9517843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9518175Z STAGE:2023-01-11 21:47:03 19902:19902 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9518455Z [1673473623.382753] [7e0e28e30a97:19901:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9518698Z [1673473624.433744] [7e0e28e30a97:19901:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9518922Z [1673473624.433744] [7e0e28e30a97:19901:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9519267Z STAGE:2023-01-11 21:47:04 19901:19901 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9519544Z [1673473623.403799] [7e0e28e30a97:19902:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9519781Z [1673473624.452295] [7e0e28e30a97:19902:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9520021Z [1673473624.452295] [7e0e28e30a97:19902:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9520363Z STAGE:2023-01-11 21:47:04 19902:19902 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9520715Z STAGE:2023-01-11 21:47:04 19901:19901 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9521065Z STAGE:2023-01-11 21:47:04 19902:19902 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9521171Z ok (5.936s) 2023-01-11T21:48:12.9521190Z 2023-01-11T21:48:12.9521438Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9521553Z Ran 1 test in 5.936s 2023-01-11T21:48:12.9521573Z 2023-01-11T21:48:12.9521668Z OK 2023-01-11T21:48:12.9521687Z 2023-01-11T21:48:12.9521818Z Generating XML reports... 2023-01-11T21:48:12.9522267Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214659.xml 2023-01-11T21:48:12.9522637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9522816Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9523200Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9523396Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9523415Z 2023-01-11T21:48:12.9523508Z Running tests... 2023-01-11T21:48:12.9523777Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9524091Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9524373Z test_send_recv_with_tag_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9524658Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20015 2023-01-11T21:48:12.9524878Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20016 2023-01-11T21:48:12.9525253Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9525479Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9525852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9526044Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9526408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9526584Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9526963Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9527152Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9527400Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9527647Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9528052Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9528430Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9528663Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9528892Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9529232Z STAGE:2023-01-11 21:47:11 20016:20016 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9529556Z STAGE:2023-01-11 21:47:11 20015:20015 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T21:48:12.9529833Z [1673473631.750794] [7e0e28e30a97:20015:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9530071Z [1673473632.803032] [7e0e28e30a97:20015:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9530348Z [1673473632.803032] [7e0e28e30a97:20015:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9530693Z STAGE:2023-01-11 21:47:13 20015:20015 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9530950Z [1673473631.771238] [7e0e28e30a97:20016:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9531187Z [1673473632.797771] [7e0e28e30a97:20016:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9531425Z [1673473632.797771] [7e0e28e30a97:20016:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9531767Z STAGE:2023-01-11 21:47:13 20016:20016 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T21:48:12.9532118Z STAGE:2023-01-11 21:47:13 20015:20015 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9532465Z STAGE:2023-01-11 21:47:13 20016:20016 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T21:48:12.9532570Z ok (5.831s) 2023-01-11T21:48:12.9532590Z 2023-01-11T21:48:12.9533025Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9533244Z Ran 1 test in 5.831s 2023-01-11T21:48:12.9533265Z 2023-01-11T21:48:12.9533342Z OK 2023-01-11T21:48:12.9533381Z 2023-01-11T21:48:12.9533491Z Generating XML reports... 2023-01-11T21:48:12.9533955Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214707.xml 2023-01-11T21:48:12.9534328Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9534571Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9534971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9535162Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9535181Z 2023-01-11T21:48:12.9535292Z Running tests... 2023-01-11T21:48:12.9535559Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9535857Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9536143Z test_sparse_all_reduce_sum (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo backend support sparse all reduce (0.002s) 2023-01-11T21:48:12.9536162Z 2023-01-11T21:48:12.9536420Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9536534Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9536553Z 2023-01-11T21:48:12.9536664Z OK (skipped=1) 2023-01-11T21:48:12.9536686Z 2023-01-11T21:48:12.9536816Z Generating XML reports... 2023-01-11T21:48:12.9537261Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214716.xml 2023-01-11T21:48:12.9537631Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9537808Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9538167Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9538364Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9538384Z 2023-01-11T21:48:12.9538494Z Running tests... 2023-01-11T21:48:12.9538760Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9539072Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9539368Z test_sparse_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo backend support sparse all reduce (0.002s) 2023-01-11T21:48:12.9539389Z 2023-01-11T21:48:12.9539653Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9539768Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9539787Z 2023-01-11T21:48:12.9539898Z OK (skipped=1) 2023-01-11T21:48:12.9539917Z 2023-01-11T21:48:12.9540023Z Generating XML reports... 2023-01-11T21:48:12.9540472Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214718.xml 2023-01-11T21:48:12.9540842Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9541020Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9541400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9541593Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9541613Z 2023-01-11T21:48:12.9541724Z Running tests... 2023-01-11T21:48:12.9541988Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9542278Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9542548Z test_stateless_api_with_ddp (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9542836Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20195 2023-01-11T21:48:12.9543056Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20196 2023-01-11T21:48:12.9543439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9543617Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9544042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9544241Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9544610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9544767Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9545139Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9545337Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9545582Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9545827Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9546232Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9546632Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9546863Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9547075Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9547356Z [1673473645.804520] [7e0e28e30a97:20196:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9547590Z [1673473645.810475] [7e0e28e30a97:20196:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9547831Z [1673473645.810475] [7e0e28e30a97:20196:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9548110Z [1673473645.801527] [7e0e28e30a97:20195:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9548343Z [1673473645.807405] [7e0e28e30a97:20195:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9548581Z [1673473645.807405] [7e0e28e30a97:20195:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9548692Z ok (6.050s) 2023-01-11T21:48:12.9548712Z 2023-01-11T21:48:12.9548987Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9549103Z Ran 1 test in 6.050s 2023-01-11T21:48:12.9549123Z 2023-01-11T21:48:12.9549199Z OK 2023-01-11T21:48:12.9549218Z 2023-01-11T21:48:12.9549345Z Generating XML reports... 2023-01-11T21:48:12.9549795Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214721.xml 2023-01-11T21:48:12.9550171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9550350Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9550728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9550922Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9551005Z 2023-01-11T21:48:12.9551121Z Running tests... 2023-01-11T21:48:12.9551393Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9551688Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9551951Z test_static_graph_api_cpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9552221Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20313 2023-01-11T21:48:12.9552448Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20314 2023-01-11T21:48:12.9552823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9553000Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9553381Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9553578Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9553920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9554095Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9554467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9554661Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9554908Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9555153Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9555553Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9555949Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9556187Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9556396Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9556679Z [1673473653.655824] [7e0e28e30a97:20313:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9556914Z [1673473654.450026] [7e0e28e30a97:20313:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9557153Z [1673473654.450026] [7e0e28e30a97:20313:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9557427Z [1673473653.678515] [7e0e28e30a97:20314:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9557661Z [1673473654.446991] [7e0e28e30a97:20314:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9557901Z [1673473654.446991] [7e0e28e30a97:20314:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9558006Z ok (5.461s) 2023-01-11T21:48:12.9558026Z 2023-01-11T21:48:12.9558307Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9558403Z Ran 1 test in 5.462s 2023-01-11T21:48:12.9558442Z 2023-01-11T21:48:12.9558519Z OK 2023-01-11T21:48:12.9558562Z 2023-01-11T21:48:12.9558693Z Generating XML reports... 2023-01-11T21:48:12.9559144Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214729.xml 2023-01-11T21:48:12.9559516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9559763Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9560152Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9560345Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9560365Z 2023-01-11T21:48:12.9560476Z Running tests... 2023-01-11T21:48:12.9560770Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9561097Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9561399Z test_sync_bn_logged (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl & Gloo backend support DistributedDataParallel (0.002s) 2023-01-11T21:48:12.9561420Z 2023-01-11T21:48:12.9561682Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9561795Z Ran 1 test in 0.002s 2023-01-11T21:48:12.9561815Z 2023-01-11T21:48:12.9561929Z OK (skipped=1) 2023-01-11T21:48:12.9561949Z 2023-01-11T21:48:12.9562074Z Generating XML reports... 2023-01-11T21:48:12.9562519Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214737.xml 2023-01-11T21:48:12.9562888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9563045Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9563426Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9563618Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9563639Z 2023-01-11T21:48:12.9563750Z Running tests... 2023-01-11T21:48:12.9564013Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9564325Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9564621Z test_undefined_grad_parity_unused_parameters (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9564840Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20460 2023-01-11T21:48:12.9565040Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20461 2023-01-11T21:48:12.9565412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9565589Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9565967Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9566162Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9566524Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9566701Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9567079Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9567270Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9567500Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9567749Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9568148Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9568543Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9568774Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9569069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9569350Z [1673473664.765746] [7e0e28e30a97:20461:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9569586Z [1673473664.771503] [7e0e28e30a97:20461:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9569869Z [1673473664.771503] [7e0e28e30a97:20461:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9570654Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:48:12.9570931Z [1673473664.757647] [7e0e28e30a97:20460:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9571164Z [1673473664.764218] [7e0e28e30a97:20460:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9571384Z [1673473664.764218] [7e0e28e30a97:20460:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9572161Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T21:48:12.9572272Z ok (5.940s) 2023-01-11T21:48:12.9572292Z 2023-01-11T21:48:12.9572568Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9572683Z Ran 1 test in 5.940s 2023-01-11T21:48:12.9572703Z 2023-01-11T21:48:12.9572799Z OK 2023-01-11T21:48:12.9572822Z 2023-01-11T21:48:12.9573134Z Generating XML reports... 2023-01-11T21:48:12.9573596Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214740.xml 2023-01-11T21:48:12.9573969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9574149Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9574510Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9574712Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9574732Z 2023-01-11T21:48:12.9574844Z Running tests... 2023-01-11T21:48:12.9575113Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9575428Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9575718Z test_verify_model_across_rank_with_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9575939Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20578 2023-01-11T21:48:12.9576161Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20579 2023-01-11T21:48:12.9576538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9576784Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9577173Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9577366Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9577731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9577967Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9578363Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9578553Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9578802Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9579028Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9579433Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9579828Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9580060Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9580293Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9580536Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.9580779Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.9581170Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9581572Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9581791Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:48:12.9582027Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:48:12.9582420Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.9582815Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.9583093Z [1673473673.145109] [7e0e28e30a97:20579:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9583331Z [1673473673.150870] [7e0e28e30a97:20579:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9583577Z [1673473673.150870] [7e0e28e30a97:20579:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9583960Z [1673473678.546704] [7e0e28e30a97:20579:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x25bda680 was not matched 2023-01-11T21:48:12.9584236Z [1673473673.141649] [7e0e28e30a97:20578:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9584469Z [1673473673.147111] [7e0e28e30a97:20578:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9584706Z [1673473673.147111] [7e0e28e30a97:20578:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9585004Z [1673473678.510129] [7e0e28e30a97:20578:1] ucc_schedule.h:189 UCC WARN timeout 5 sec. has expired on req 0x232dcd80, seq_num 5, TL_UCP, team_id 1, size 2, rank 0, ctx_rank 0: Barrier n/a inplace=0 bytes=0 2023-01-11T21:48:12.9585344Z [1673473678.556831] [7e0e28e30a97:20578:0] mpool.c:55 UCX WARN object 0x23411940 {flags:0x20040 recv length 0 host memory} was not returned to mpool ucp_requests 2023-01-11T21:48:12.9585451Z ok (10.428s) 2023-01-11T21:48:12.9585472Z 2023-01-11T21:48:12.9585751Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9585917Z Ran 1 test in 10.428s 2023-01-11T21:48:12.9585939Z 2023-01-11T21:48:12.9586037Z OK 2023-01-11T21:48:12.9586057Z 2023-01-11T21:48:12.9586188Z Generating XML reports... 2023-01-11T21:48:12.9586649Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214748.xml 2023-01-11T21:48:12.9587023Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9587181Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9587568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9587761Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9587781Z 2023-01-11T21:48:12.9587896Z Running tests... 2023-01-11T21:48:12.9588165Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9588480Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T21:48:12.9588769Z test_verify_model_across_rank_without_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T21:48:12.9588992Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20698 2023-01-11T21:48:12.9589190Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20699 2023-01-11T21:48:12.9589560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9589743Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9590123Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9590315Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9590680Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T21:48:12.9590855Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T21:48:12.9591229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T21:48:12.9591418Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T21:48:12.9591645Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T21:48:12.9591896Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T21:48:12.9592296Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9592693Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T21:48:12.9592930Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T21:48:12.9593158Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T21:48:12.9593398Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T21:48:12.9593639Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T21:48:12.9594034Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9594478Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T21:48:12.9594721Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T21:48:12.9594959Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T21:48:12.9595401Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.9595803Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T21:48:12.9596082Z [1673473686.176023] [7e0e28e30a97:20699:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9596317Z [1673473686.181812] [7e0e28e30a97:20699:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9596563Z [1673473686.181812] [7e0e28e30a97:20699:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9596949Z [1673473691.578249] [7e0e28e30a97:20699:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x242fa140 was not matched 2023-01-11T21:48:12.9597225Z [1673473686.173570] [7e0e28e30a97:20698:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T21:48:12.9597437Z [1673473686.178844] [7e0e28e30a97:20698:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T21:48:12.9597676Z [1673473686.178844] [7e0e28e30a97:20698:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T21:48:12.9597990Z [1673473691.541589] [7e0e28e30a97:20698:1] ucc_schedule.h:189 UCC WARN timeout 5 sec. has expired on req 0x21a1b000, seq_num 5, TL_UCP, team_id 1, size 2, rank 0, ctx_rank 0: Barrier n/a inplace=0 bytes=0 2023-01-11T21:48:12.9598270Z [1673473691.588260] [7e0e28e30a97:20698:0] mpool.c:55 UCX WARN object 0x21a567c0 {flags:0x20040 recv length 0 host memory} was not returned to mpool ucp_requests 2023-01-11T21:48:12.9598376Z ok (10.524s) 2023-01-11T21:48:12.9598396Z 2023-01-11T21:48:12.9598671Z ---------------------------------------------------------------------- 2023-01-11T21:48:12.9598787Z Ran 1 test in 10.524s 2023-01-11T21:48:12.9598806Z 2023-01-11T21:48:12.9598901Z OK 2023-01-11T21:48:12.9598921Z 2023-01-11T21:48:12.9599047Z Generating XML reports... 2023-01-11T21:48:12.9599494Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214801.xml 2023-01-11T21:48:12.9599514Z 2023-01-11T21:48:12.9599996Z ##[endgroup] 2023-01-11T21:48:12.9600467Z FINISHED PRINTING LOG FILE of distributed/test_distributed_spawn (/var/lib/jenkins/workspace/test/test-reports/distributed-test_distributed_spawn_lx5pfkm6) 2023-01-11T21:48:12.9600491Z 2023-01-11T21:48:12.9600705Z Running distributed tests for the ucc backend with file init_method in shard 3 of 3 2023-01-11T21:48:12.9601219Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:48:12.730832] 2023-01-11T22:12:51.7553265Z 2023-01-11T22:12:51.7556012Z Expand the folded group to see the log file of distributed/test_distributed_spawn 2023-01-11T22:12:51.7557039Z ##[group]PRINTING LOG FILE of distributed/test_distributed_spawn (/var/lib/jenkins/workspace/test/test-reports/distributed-test_distributed_spawn_gmgwclcw) 2023-01-11T22:12:51.7557948Z 2023-01-11T22:12:51.7610577Z , <__main__.TestDistBackendWithSpawn testMethod=test_3_level_hierarchical_model_averager>, <__main__.TestDistBackendWithSpawn testMethod=test_Backend_enum_class>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallelCPU>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallelCPU_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_2D_Input>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Channels_Last>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_No_Affine>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_non_default_stream>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_requires_grad>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedDataParallel_with_amp_and_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_DistributedSampler_padding>, <__main__.TestDistBackendWithSpawn testMethod=test_SyncBatchNorm_process_group>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync_allreduce_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync_allreduce_with_then_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_accumulate_gradients_no_sync_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_simple>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_coalesced_with_empty>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_into_cat_tensor_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_into_stack_tensor_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_multigpu_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_object_default_pg>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_object_subgroup>, <__main__.TestDistBackendWithSpawn testMethod=test_all_gather_v_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_full_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_max_complex_unsupported>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_coalesced_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_complex_unsupported_ops>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_full_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_max>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_min>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_multigpu_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_product>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_result_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_async>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_cuda_async>, <__main__.TestDistBackendWithSpawn testMethod=test_all_reduce_sum_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_equal_split_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_group>, <__main__.TestDistBackendWithSpawn testMethod=test_all_to_all_single_unequal_split_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_average_parameters>, <__main__.TestDistBackendWithSpawn testMethod=test_backend_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_backend_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_full_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_group_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_timeout_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_timeout_global>, <__main__.TestDistBackendWithSpawn testMethod=test_barrier_timeout_group>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_gloo>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_gloo_tags>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_mixed_backend_err>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_no_rank_zero_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_op_err>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_op_list_err>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_ring_exchange_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_self_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_batch_isend_irecv_tensor_err>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_group>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_broadcast_object_list>, <__main__.TestDistBackendWithSpawn testMethod=test_compute_bucket_assignment_by_size_sparse_error_with_logger>, <__main__.TestDistBackendWithSpawn testMethod=test_compute_bucket_assignment_by_size_sparse_error_without_logger>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_apply_optim_in_backward>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_apply_optim_in_backward_grad_as_bucket_view_false>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_apply_optim_in_backward_ignored_params>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_broadcast_buffer>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_broadcast_buffer_via_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_buffer_hook_allreduce>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_buffer_hook_allreduce_return_future>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_build_debug_param_to_name_mapping>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_build_debug_param_to_name_mapping_requires_grad>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_comm_hook_logging>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_control_flow_different_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_control_flow_same_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_create_graph>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_device>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_forward_backward_hook>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_grad_div_uneven_inputs>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_allreduce>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_allreduce_process_group>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_post_localSGD>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_parity_powerSGD>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_pickling_powerSGD>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adam_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adam_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_False>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_True>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_ignore_params_arg>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_inference>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_join_model_equivalence>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_logging_data_cpu>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_logging_data_gpu>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_model_diff_num_params_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_model_diff_shape_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_multiple_nested_unused_params_err_ignore_params>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_multiple_nested_unused_params_error>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_namedtuple>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_new_tensor_in_fwd>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_new_tensor_in_fwd_static_graph>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_profiling_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_profiling_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_python_error_logged>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_returns_tensor_with_no_grad>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_shared_grad_acc_unused_params>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_static_graph_nested_types>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_sync_bn_training_vs_eval>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_sync_module_states>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_input_exception>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_input_join_disable>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_inputs>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_uneven_inputs_stop_iteration_sync_bn>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_unused_params_rebuild_buckets_exception>, <__main__.TestDistBackendWithSpawn testMethod=test_ddp_zero_output_features>, <__main__.TestDistBackendWithSpawn testMethod=test_destroy_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_destroy_group>, <__main__.TestDistBackendWithSpawn testMethod=test_detect_ddp_is_actually_static>, <__main__.TestDistBackendWithSpawn testMethod=test_different_graph_across_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_dump_DDP_relevant_env_vars>, <__main__.TestDistBackendWithSpawn testMethod=test_gather>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_checks>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_group>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_object>, <__main__.TestDistBackendWithSpawn testMethod=test_gather_object_subgroup>, <__main__.TestDistBackendWithSpawn testMethod=test_get_backend>, <__main__.TestDistBackendWithSpawn testMethod=test_get_future>, <__main__.TestDistBackendWithSpawn testMethod=test_get_rank>, <__main__.TestDistBackendWithSpawn testMethod=test_get_rank_size_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_get_rank_size_group>, <__main__.TestDistBackendWithSpawn testMethod=test_invalid_static_graph>, <__main__.TestDistBackendWithSpawn testMethod=test_irecv>, <__main__.TestDistBackendWithSpawn testMethod=test_isend>, <__main__.TestDistBackendWithSpawn testMethod=test_isend_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_isend_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_allreduce_hang>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_allreduce_hang_wait_all_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_failure_order>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_gloo>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_gloo_rank_0_timeout>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_gloo_subgroup>, <__main__.TestDistBackendWithSpawn testMethod=test_monitored_barrier_wait_all_ranks>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_allgather>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_allreduce>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_broadcast>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_backend_bool_reduce>, <__main__.TestDistBackendWithSpawn testMethod=test_nccl_high_priority_stream>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_by_enumeration>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_by_enumeration_input_rank_exceeds_world_size>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_by_enumeration_negative_input_rank>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_group_size_exceeds_world_size>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_overlap_not_allowed>, <__main__.TestDistBackendWithSpawn testMethod=test_new_subgroups_world_size_not_divisible_by_group_size>, <__main__.TestDistBackendWithSpawn testMethod=test_output_unused_in_loss_dict_module>, <__main__.TestDistBackendWithSpawn testMethod=test_output_unused_in_loss_tuple_module>, <__main__.TestDistBackendWithSpawn testMethod=test_periodic_model_averager>, <__main__.TestDistBackendWithSpawn testMethod=test_periodic_model_averager_param_group>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity_with_hierarchical_sgd>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view>, <__main__.TestDistBackendWithSpawn testMethod=test_post_localSGD_optimizer_step_reload>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_full_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_max>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_min>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_product>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_group_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_max>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_min>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_multigpu>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_product>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_scatter_tensor_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_scatter_v_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum_cuda_twice>, <__main__.TestDistBackendWithSpawn testMethod=test_reduce_sum_twice>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_checks>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_cuda_complex>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_full_group>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_group>, <__main__.TestDistBackendWithSpawn testMethod=test_scatter_object_list>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_any_source>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_any_source_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_any_source_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_nccl>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_nccl_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_nccl_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_with_tag>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_with_tag_autograd_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_send_recv_with_tag_torch_profiler>, <__main__.TestDistBackendWithSpawn testMethod=test_sparse_all_reduce_sum>, <__main__.TestDistBackendWithSpawn testMethod=test_sparse_all_reduce_sum_cuda>, <__main__.TestDistBackendWithSpawn testMethod=test_stateless_api_with_ddp>, <__main__.TestDistBackendWithSpawn testMethod=test_static_graph_api_cpu>, <__main__.TestDistBackendWithSpawn testMethod=test_sync_bn_logged>, <__main__.TestDistBackendWithSpawn testMethod=test_undefined_grad_parity_unused_parameters>, <__main__.TestDistBackendWithSpawn testMethod=test_verify_model_across_rank_with_logger>, <__main__.TestDistBackendWithSpawn testMethod=test_verify_model_across_rank_without_logger>]> 2023-01-11T22:12:51.7676145Z test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7677076Z test_3_level_hierarchical_model_averager (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7677834Z test_Backend_enum_class (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7678550Z test_DistributedDataParallel (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7679308Z test_DistributedDataParallelCPU (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7680158Z test_DistributedDataParallelCPU_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7681066Z test_DistributedDataParallel_SyncBatchNorm (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7681914Z test_DistributedDataParallel_SyncBatchNorm_2D_Input (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7682831Z test_DistributedDataParallel_SyncBatchNorm_Channels_Last (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7683948Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7684932Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7685803Z test_DistributedDataParallel_SyncBatchNorm_No_Affine (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7686853Z test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7687754Z test_DistributedDataParallel_non_default_stream (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7688614Z test_DistributedDataParallel_requires_grad (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7689539Z test_DistributedDataParallel_with_amp_and_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7690445Z test_DistributedSampler_padding (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7691315Z test_SyncBatchNorm_process_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7692184Z test_accumulate_gradients_no_sync (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7693450Z test_accumulate_gradients_no_sync_allreduce_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7694267Z test_accumulate_gradients_no_sync_allreduce_with_then_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7694758Z test_accumulate_gradients_no_sync_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7695325Z test_all_gather (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7695779Z test_all_gather_coalesced_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7696219Z test_all_gather_coalesced_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7696630Z test_all_gather_coalesced_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7697059Z test_all_gather_coalesced_simple (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7697494Z test_all_gather_coalesced_with_empty (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7697895Z test_all_gather_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7698288Z test_all_gather_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7698685Z test_all_gather_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7699137Z test_all_gather_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7699846Z test_all_gather_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7700281Z test_all_gather_into_cat_tensor_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7700719Z test_all_gather_into_stack_tensor_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7701120Z test_all_gather_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7701537Z test_all_gather_multigpu_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7701964Z test_all_gather_object_default_pg (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7702391Z test_all_gather_object_subgroup (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7702784Z test_all_gather_v_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7703199Z test_all_reduce_coalesced_full_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7703649Z test_all_reduce_coalesced_full_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7704085Z test_all_reduce_coalesced_full_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7704543Z test_all_reduce_coalesced_full_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7704982Z test_all_reduce_coalesced_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7705395Z test_all_reduce_coalesced_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7705833Z test_all_reduce_coalesced_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7706278Z test_all_reduce_coalesced_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7706822Z test_all_reduce_coalesced_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7707250Z test_all_reduce_coalesced_max_complex_unsupported (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7707702Z test_all_reduce_coalesced_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7708121Z test_all_reduce_coalesced_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7708523Z test_all_reduce_coalesced_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7709021Z test_all_reduce_complex_unsupported_ops (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7709477Z test_all_reduce_full_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7709890Z test_all_reduce_full_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7710291Z test_all_reduce_full_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7710712Z test_all_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7711118Z test_all_reduce_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7711513Z test_all_reduce_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7711924Z test_all_reduce_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7712328Z test_all_reduce_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7712719Z test_all_reduce_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7713083Z test_all_reduce_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7713481Z test_all_reduce_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7713895Z test_all_reduce_multigpu_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7714296Z test_all_reduce_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7714699Z test_all_reduce_result_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7715090Z test_all_reduce_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7715461Z test_all_reduce_sum_async (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7715874Z test_all_reduce_sum_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7716280Z test_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7716682Z test_all_reduce_sum_cuda_async (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7717084Z test_all_reduce_sum_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7717479Z test_all_to_all (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7717862Z test_all_to_all_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7718227Z test_all_to_all_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7718628Z test_all_to_all_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7719035Z test_all_to_all_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7719431Z test_all_to_all_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7719837Z test_all_to_all_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7720236Z test_all_to_all_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7720653Z test_all_to_all_single_equal_split (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7721069Z test_all_to_all_single_equal_split_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7721520Z test_all_to_all_single_equal_split_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7721974Z test_all_to_all_single_equal_split_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7722420Z test_all_to_all_single_equal_split_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7722886Z test_all_to_all_single_equal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7723345Z test_all_to_all_single_equal_split_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7723795Z test_all_to_all_single_equal_split_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7724219Z test_all_to_all_single_unequal_split (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7724761Z test_all_to_all_single_unequal_split_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7725216Z test_all_to_all_single_unequal_split_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7725650Z test_all_to_all_single_unequal_split_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7726123Z test_all_to_all_single_unequal_split_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7726636Z test_all_to_all_single_unequal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7727108Z test_all_to_all_single_unequal_split_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7727547Z test_all_to_all_single_unequal_split_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7727981Z test_average_parameters (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7728379Z test_backend_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7728748Z test_backend_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7729121Z test_barrier (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7729497Z test_barrier_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7729864Z test_barrier_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7730267Z test_barrier_full_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7730664Z test_barrier_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7731059Z test_barrier_group_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7731445Z test_barrier_timeout_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7731867Z test_barrier_timeout_global (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7732272Z test_barrier_timeout_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7732653Z test_batch_isend_irecv_gloo (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7733868Z test_batch_isend_irecv_gloo_tags (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7734577Z test_batch_isend_irecv_mixed_backend_err (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7735335Z test_batch_isend_irecv_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7736052Z test_batch_isend_irecv_no_rank_zero_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7736808Z test_batch_isend_irecv_op_err (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7737697Z test_batch_isend_irecv_op_list_err (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7738477Z test_batch_isend_irecv_ring_exchange_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7739322Z test_batch_isend_irecv_self_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7740172Z test_batch_isend_irecv_tensor_err (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7740579Z test_broadcast (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7740936Z test_broadcast_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7741340Z test_broadcast_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7741737Z test_broadcast_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7742120Z test_broadcast_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7742524Z test_broadcast_object_list (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7742986Z test_compute_bucket_assignment_by_size_sparse_error_with_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7743487Z test_compute_bucket_assignment_by_size_sparse_error_without_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7743955Z test_ddp_apply_optim_in_backward (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7744415Z test_ddp_apply_optim_in_backward_grad_as_bucket_view_false (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7744894Z test_ddp_apply_optim_in_backward_ignored_params (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7745314Z test_ddp_broadcast_buffer (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7745863Z test_ddp_broadcast_buffer_via_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7746283Z test_ddp_buffer_hook_allreduce (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7746704Z test_ddp_buffer_hook_allreduce_return_future (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7747158Z test_ddp_build_debug_param_to_name_mapping (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7747697Z test_ddp_build_debug_param_to_name_mapping_requires_grad (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7748151Z test_ddp_comm_hook_logging (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7748561Z test_ddp_control_flow_different_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7749002Z test_ddp_control_flow_same_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7749418Z test_ddp_create_graph (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7749779Z test_ddp_device (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7750180Z test_ddp_forward_backward_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7750597Z test_ddp_grad_div_uneven_inputs (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7750998Z test_ddp_hook_parity_allreduce (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7751435Z test_ddp_hook_parity_allreduce_process_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7751878Z test_ddp_hook_parity_post_localSGD (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7752302Z test_ddp_hook_parity_powerSGD (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7752699Z test_ddp_hook_pickling_powerSGD (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7753165Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7753664Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7754229Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7754827Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7755427Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7756039Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7756649Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7757234Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7757829Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7758440Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7758990Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_False (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7759473Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_True (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7759919Z test_ddp_ignore_params_arg (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7760312Z test_ddp_inference (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7760697Z test_ddp_join_model_equivalence (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7761106Z test_ddp_logging_data_cpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7761505Z test_ddp_logging_data_gpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7761929Z test_ddp_model_diff_num_params_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7762413Z test_ddp_model_diff_shape_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7762881Z test_ddp_multiple_nested_unused_params_err_ignore_params (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7763355Z test_ddp_multiple_nested_unused_params_error (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7763752Z test_ddp_namedtuple (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7764190Z test_ddp_new_tensor_in_fwd (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7764611Z test_ddp_new_tensor_in_fwd_static_graph (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7765033Z test_ddp_profiling_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7765468Z test_ddp_profiling_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7765885Z test_ddp_python_error_logged (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7766302Z test_ddp_returns_tensor_with_no_grad (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7766722Z test_ddp_shared_grad_acc_unused_params (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7767156Z test_ddp_static_graph_nested_types (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7767581Z test_ddp_sync_bn_training_vs_eval (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7767975Z test_ddp_sync_module_states (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7768392Z test_ddp_uneven_input_exception (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7768815Z test_ddp_uneven_input_join_disable (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7769223Z test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7769626Z test_ddp_uneven_inputs_stop_iteration_sync_bn (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7770090Z test_ddp_unused_params_rebuild_buckets_exception (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7770531Z test_ddp_zero_output_features (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7770922Z test_destroy_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7771310Z test_destroy_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7771714Z test_detect_ddp_is_actually_static (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7772118Z test_different_graph_across_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7772544Z test_dump_DDP_relevant_env_vars (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7773420Z test_gather (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7773872Z test_gather_checks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7774228Z test_gather_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7774613Z test_gather_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7774999Z test_gather_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7775357Z test_gather_object (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7775751Z test_gather_object_subgroup (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7776145Z test_get_backend (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7776496Z test_get_future (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7776857Z test_get_rank (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7777243Z test_get_rank_size_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7777645Z test_get_rank_size_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7778025Z test_invalid_static_graph (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7778399Z test_irecv (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7778747Z test_isend (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7779112Z test_isend_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7779519Z test_isend_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7779936Z test_monitored_barrier_allreduce_hang (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7780469Z test_monitored_barrier_allreduce_hang_wait_all_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7780924Z test_monitored_barrier_failure_order (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7781345Z test_monitored_barrier_gloo (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7781769Z test_monitored_barrier_gloo_rank_0_timeout (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7782256Z test_monitored_barrier_gloo_subgroup (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7782698Z test_monitored_barrier_wait_all_ranks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7783124Z test_nccl_backend_bool_allgather (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7783529Z test_nccl_backend_bool_allreduce (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7783950Z test_nccl_backend_bool_broadcast (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7784366Z test_nccl_backend_bool_reduce (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7784763Z test_nccl_high_priority_stream (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7785163Z test_new_subgroups (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7785568Z test_new_subgroups_by_enumeration (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7786042Z test_new_subgroups_by_enumeration_input_rank_exceeds_world_size (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7786520Z test_new_subgroups_by_enumeration_negative_input_rank (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7786999Z test_new_subgroups_group_size_exceeds_world_size (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7787459Z test_new_subgroups_overlap_not_allowed (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7787904Z test_new_subgroups_world_size_not_divisible_by_group_size (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7788363Z test_output_unused_in_loss_dict_module (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7788798Z test_output_unused_in_loss_tuple_module (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7789223Z test_periodic_model_averager (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7789637Z test_periodic_model_averager_param_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7790077Z test_post_localSGD_optimizer_parity (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7790528Z test_post_localSGD_optimizer_parity_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7790996Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7791514Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7792006Z test_post_localSGD_optimizer_step_reload (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7792429Z test_reduce_full_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7792809Z test_reduce_full_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7793215Z test_reduce_full_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7793625Z test_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7793999Z test_reduce_group_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7794385Z test_reduce_group_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7794781Z test_reduce_group_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7795178Z test_reduce_group_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7795533Z test_reduce_max (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7795900Z test_reduce_min (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7796279Z test_reduce_multigpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7796644Z test_reduce_product (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7797044Z test_reduce_scatter_tensor_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7797456Z test_reduce_scatter_v_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7797881Z test_reduce_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7798260Z test_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7798648Z test_reduce_sum_cuda_twice (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7799018Z test_reduce_sum_twice (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7799385Z test_scatter (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7799800Z test_scatter_checks (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7800184Z test_scatter_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7800540Z test_scatter_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7800930Z test_scatter_cuda_complex (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7801326Z test_scatter_full_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7801692Z test_scatter_group (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7802077Z test_scatter_object_list (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7802456Z test_send_recv (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7802817Z test_send_recv_any_source (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7803244Z test_send_recv_any_source_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7803694Z test_send_recv_any_source_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7804130Z test_send_recv_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7804516Z test_send_recv_nccl (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7804924Z test_send_recv_nccl_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7805353Z test_send_recv_nccl_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7805752Z test_send_recv_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7806151Z test_send_recv_with_tag (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7806564Z test_send_recv_with_tag_autograd_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7806992Z test_send_recv_with_tag_torch_profiler (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7807409Z test_sparse_all_reduce_sum (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7807816Z test_sparse_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7808226Z test_stateless_api_with_ddp (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7808613Z test_static_graph_api_cpu (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7809001Z test_sync_bn_logged (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7809423Z test_undefined_grad_parity_unused_parameters (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7809858Z test_verify_model_across_rank_with_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7810303Z test_verify_model_across_rank_without_logger (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.7811039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7811505Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7812069Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7812544Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7812776Z 2023-01-11T22:12:51.7813499Z Running tests... 2023-01-11T22:12:51.7814045Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7814583Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.7815170Z test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.7815735Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20851 2023-01-11T22:12:51.7816275Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20852 2023-01-11T22:12:51.7816891Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7817346Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7817922Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7818459Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7819055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7819501Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7820056Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7820524Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7820982Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.7821479Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.7822124Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7822821Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7823344Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.7823818Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.7824327Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:12:51.7825194Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:12:51.7825867Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:12:51.7826696Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:12:51.7827339Z [1673473702.104018] [7e0e28e30a97:20851:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7827882Z [1673473702.109941] [7e0e28e30a97:20852:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7828392Z [1673473702.110947] [7e0e28e30a97:20851:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7828879Z [1673473702.110947] [7e0e28e30a97:20851:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7829337Z [1673473702.114870] [7e0e28e30a97:20852:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7829810Z [1673473702.114870] [7e0e28e30a97:20852:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7830337Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:12:51.7831172Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:12:51.7831819Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:12:51.7832635Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:12:51.7833361Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:12:51.7834192Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:12:51.7834882Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:12:51.7835707Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:12:51.7836359Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:12:51.7837171Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:12:51.7837814Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager:Model averaging hierarchy: 2023-01-11T22:12:51.7838633Z INFO:torch.distributed.algorithms.model_averaging.hierarchical_model_averager: Each group that has 2 processes average parameters every 4 iterations, if no higher-level averaging. 2023-01-11T22:12:51.7839115Z ok (6.012s) 2023-01-11T22:12:51.7839269Z 2023-01-11T22:12:51.7839543Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7839854Z Ran 1 test in 6.012s 2023-01-11T22:12:51.7840020Z 2023-01-11T22:12:51.7840116Z OK 2023-01-11T22:12:51.7840251Z 2023-01-11T22:12:51.7840377Z Generating XML reports... 2023-01-11T22:12:51.7840967Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214816.xml 2023-01-11T22:12:51.7841686Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7842143Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7842717Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7843171Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7843408Z 2023-01-11T22:12:51.7843524Z Running tests... 2023-01-11T22:12:51.7843931Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7844440Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.7844969Z test_3_level_hierarchical_model_averager (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.003s) 2023-01-11T22:12:51.7845280Z 2023-01-11T22:12:51.7845550Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7845887Z Ran 1 test in 0.004s 2023-01-11T22:12:51.7846052Z 2023-01-11T22:12:51.7846143Z OK (skipped=1) 2023-01-11T22:12:51.7846300Z 2023-01-11T22:12:51.7846425Z Generating XML reports... 2023-01-11T22:12:51.7847031Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214825.xml 2023-01-11T22:12:51.7847748Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7848185Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7848764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7849235Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7849474Z 2023-01-11T22:12:51.7849564Z Running tests... 2023-01-11T22:12:51.7849969Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7850578Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.7851089Z test_Backend_enum_class (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.7851563Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20999 2023-01-11T22:12:51.7852083Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21000 2023-01-11T22:12:51.7852707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7853844Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7854439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7854909Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7855497Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7855924Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7856499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7856966Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7857407Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.7857903Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.7858594Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7859289Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7859796Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.7860267Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.7860610Z ok (4.229s) 2023-01-11T22:12:51.7860764Z 2023-01-11T22:12:51.7861038Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7861348Z Ran 1 test in 4.230s 2023-01-11T22:12:51.7861515Z 2023-01-11T22:12:51.7861613Z OK 2023-01-11T22:12:51.7861750Z 2023-01-11T22:12:51.7861879Z Generating XML reports... 2023-01-11T22:12:51.7862464Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214827.xml 2023-01-11T22:12:51.7863176Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7863631Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7864213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7864669Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7864898Z 2023-01-11T22:12:51.7865011Z Running tests... 2023-01-11T22:12:51.7865420Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7865935Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.7866469Z test_DistributedDataParallel (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.7867519Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77317 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.616s) 2023-01-11T22:12:51.7868152Z 2023-01-11T22:12:51.7868433Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7868764Z Ran 1 test in 1.617s 2023-01-11T22:12:51.7868907Z 2023-01-11T22:12:51.7869019Z OK (skipped=1) 2023-01-11T22:12:51.7869174Z 2023-01-11T22:12:51.7869302Z Generating XML reports... 2023-01-11T22:12:51.7869906Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214834.xml 2023-01-11T22:12:51.7870667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7871132Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7871714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7872184Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7872416Z 2023-01-11T22:12:51.7872513Z Running tests... 2023-01-11T22:12:51.7872925Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7873457Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.7873993Z test_DistributedDataParallelCPU (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.7874494Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21136 2023-01-11T22:12:51.7874947Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21137 2023-01-11T22:12:51.7875564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7875996Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7876573Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7877048Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7877629Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7878055Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7878626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7879096Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7879536Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.7880035Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.7880695Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7881381Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7881891Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.7882364Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.7882845Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7883336Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7883844Z [1673473722.597174] [7e0e28e30a97:21136:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7884355Z [1673473723.386340] [7e0e28e30a97:21136:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7884834Z [1673473723.386340] [7e0e28e30a97:21136:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7885421Z [1673473722.620510] [7e0e28e30a97:21137:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7885905Z [1673473723.404466] [7e0e28e30a97:21137:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7886418Z [1673473723.404466] [7e0e28e30a97:21137:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7886765Z ok (5.532s) 2023-01-11T22:12:51.7886915Z 2023-01-11T22:12:51.7887198Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7887508Z Ran 1 test in 5.532s 2023-01-11T22:12:51.7887670Z 2023-01-11T22:12:51.7887766Z OK 2023-01-11T22:12:51.7887901Z 2023-01-11T22:12:51.7888027Z Generating XML reports... 2023-01-11T22:12:51.7888614Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214838.xml 2023-01-11T22:12:51.7889336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7889792Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7890370Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7890829Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7891061Z 2023-01-11T22:12:51.7891173Z Running tests... 2023-01-11T22:12:51.7891579Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7892090Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.7892643Z test_DistributedDataParallelCPU_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.7893565Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21250 2023-01-11T22:12:51.7894018Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21251 2023-01-11T22:12:51.7894615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7895067Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7895649Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7896101Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7896678Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7897122Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7897694Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7898146Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7898599Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.7899096Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.7899759Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7900433Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7900953Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.7901425Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.7901885Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7902468Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7902993Z [1673473730.702410] [7e0e28e30a97:21251:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7903559Z [1673473731.474803] [7e0e28e30a97:21251:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7904026Z [1673473731.474803] [7e0e28e30a97:21251:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7904542Z [1673473730.681806] [7e0e28e30a97:21250:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7905043Z [1673473731.485732] [7e0e28e30a97:21250:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7905515Z [1673473731.485732] [7e0e28e30a97:21250:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7905842Z ok (5.536s) 2023-01-11T22:12:51.7905991Z 2023-01-11T22:12:51.7906272Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7906603Z Ran 1 test in 5.537s 2023-01-11T22:12:51.7906765Z 2023-01-11T22:12:51.7906862Z OK 2023-01-11T22:12:51.7906978Z 2023-01-11T22:12:51.7907108Z Generating XML reports... 2023-01-11T22:12:51.7907715Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214846.xml 2023-01-11T22:12:51.7908428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7908859Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7909441Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7909915Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7910145Z 2023-01-11T22:12:51.7910258Z Running tests... 2023-01-11T22:12:51.7910642Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7911174Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.7911729Z test_DistributedDataParallel_SyncBatchNorm (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.7912241Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21364 2023-01-11T22:12:51.7912691Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21365 2023-01-11T22:12:51.7913292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7913744Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7914304Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7914771Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7915352Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7915782Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7916354Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7916816Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7917273Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.7917756Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.7918489Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7919175Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7919697Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.7920203Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.7920725Z [1673473739.519584] [7e0e28e30a97:21365:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7921233Z [1673473739.525186] [7e0e28e30a97:21365:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7921712Z [1673473739.525186] [7e0e28e30a97:21365:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7922212Z [1673473739.516359] [7e0e28e30a97:21364:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7922718Z [1673473739.521680] [7e0e28e30a97:21364:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7923191Z [1673473739.521680] [7e0e28e30a97:21364:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7923671Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7924136Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7924619Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7925102Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7925561Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7926073Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7926549Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7927021Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7927478Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7927950Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7928415Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7928869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7929221Z ok (6.444s) 2023-01-11T22:12:51.7929370Z 2023-01-11T22:12:51.7929659Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7929993Z Ran 1 test in 6.445s 2023-01-11T22:12:51.7930136Z 2023-01-11T22:12:51.7930233Z OK 2023-01-11T22:12:51.7930370Z 2023-01-11T22:12:51.7930498Z Generating XML reports... 2023-01-11T22:12:51.7931104Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214854.xml 2023-01-11T22:12:51.7931805Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7932262Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7932841Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7933740Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7933975Z 2023-01-11T22:12:51.7934066Z Running tests... 2023-01-11T22:12:51.7934483Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7935117Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.7935661Z test_DistributedDataParallel_SyncBatchNorm_2D_Input (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.7936201Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21482 2023-01-11T22:12:51.7936729Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21483 2023-01-11T22:12:51.7937357Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7937790Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7938365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7938835Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7939419Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7939847Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7940421Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7940893Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7941328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.7941823Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.7942474Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7943170Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7943678Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.7944150Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.7944670Z [1673473748.364178] [7e0e28e30a97:21482:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7945187Z [1673473748.369899] [7e0e28e30a97:21482:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7945647Z [1673473748.369899] [7e0e28e30a97:21482:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7946157Z [1673473748.370580] [7e0e28e30a97:21483:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7946665Z [1673473748.377009] [7e0e28e30a97:21483:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7947134Z [1673473748.377009] [7e0e28e30a97:21483:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7947595Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7948085Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7948569Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7949047Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7949378Z ok (5.414s) 2023-01-11T22:12:51.7949527Z 2023-01-11T22:12:51.7949803Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7950131Z Ran 1 test in 5.414s 2023-01-11T22:12:51.7950295Z 2023-01-11T22:12:51.7950439Z OK 2023-01-11T22:12:51.7950575Z 2023-01-11T22:12:51.7950703Z Generating XML reports... 2023-01-11T22:12:51.7951317Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214903.xml 2023-01-11T22:12:51.7952031Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7952463Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7953087Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7953565Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7953795Z 2023-01-11T22:12:51.7953886Z Running tests... 2023-01-11T22:12:51.7954293Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7954821Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.7955397Z test_DistributedDataParallel_SyncBatchNorm_Channels_Last (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.7955923Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21600 2023-01-11T22:12:51.7956375Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21601 2023-01-11T22:12:51.7956985Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7957417Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7957992Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7958462Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7959044Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7959473Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7960049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7960516Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7960951Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.7961450Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.7962107Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7962793Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7963295Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.7963770Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.7964282Z [1673473756.416815] [7e0e28e30a97:21601:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7964794Z [1673473756.422327] [7e0e28e30a97:21601:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7965252Z [1673473756.422327] [7e0e28e30a97:21601:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7965766Z [1673473756.410695] [7e0e28e30a97:21600:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7966268Z [1673473756.415971] [7e0e28e30a97:21600:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7966735Z [1673473756.415971] [7e0e28e30a97:21600:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7967258Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7967744Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7968222Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7968752Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7969212Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7969683Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7970163Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7970635Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7970968Z ok (5.625s) 2023-01-11T22:12:51.7971117Z 2023-01-11T22:12:51.7971401Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7971732Z Ran 1 test in 5.625s 2023-01-11T22:12:51.7971892Z 2023-01-11T22:12:51.7971970Z OK 2023-01-11T22:12:51.7972104Z 2023-01-11T22:12:51.7972231Z Generating XML reports... 2023-01-11T22:12:51.7972838Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214911.xml 2023-01-11T22:12:51.7974067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7974502Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7975080Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7975551Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7975789Z 2023-01-11T22:12:51.7975880Z Running tests... 2023-01-11T22:12:51.7976286Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7976810Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.7977399Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.7977951Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21718 2023-01-11T22:12:51.7978401Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21719 2023-01-11T22:12:51.7979010Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7979442Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7980014Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7980486Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7981065Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7981488Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7982064Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7982531Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7982966Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.7983454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.7984106Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7984905Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.7985409Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.7985882Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.7986456Z [1673473764.629769] [7e0e28e30a97:21719:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7986977Z [1673473764.635242] [7e0e28e30a97:21719:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7987436Z [1673473764.635242] [7e0e28e30a97:21719:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7987950Z [1673473764.621136] [7e0e28e30a97:21718:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.7988452Z [1673473764.627685] [7e0e28e30a97:21718:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.7988920Z [1673473764.627685] [7e0e28e30a97:21718:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.7989377Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7989858Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.7990211Z ok (5.770s) 2023-01-11T22:12:51.7990359Z 2023-01-11T22:12:51.7990641Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7990953Z Ran 1 test in 5.770s 2023-01-11T22:12:51.7991115Z 2023-01-11T22:12:51.7991210Z OK 2023-01-11T22:12:51.7991345Z 2023-01-11T22:12:51.7991475Z Generating XML reports... 2023-01-11T22:12:51.7992063Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214919.xml 2023-01-11T22:12:51.7992780Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7993233Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7993810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7994259Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7994489Z 2023-01-11T22:12:51.7994599Z Running tests... 2023-01-11T22:12:51.7995004Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.7995508Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.7996094Z test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.7996655Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21836 2023-01-11T22:12:51.7997104Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21837 2023-01-11T22:12:51.7997692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.7998143Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.7998721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.7999188Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.7999747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8000191Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8000840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8001288Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8001740Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8002286Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8002957Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8003626Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8004145Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8004616Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8005134Z [1673473772.926544] [7e0e28e30a97:21836:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8005627Z [1673473772.933391] [7e0e28e30a97:21836:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8006105Z [1673473772.933391] [7e0e28e30a97:21836:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8006617Z [1673473772.927714] [7e0e28e30a97:21837:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8007118Z [1673473772.934295] [7e0e28e30a97:21837:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8007567Z [1673473772.934295] [7e0e28e30a97:21837:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8008041Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8008525Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8008986Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8009466Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8009815Z ok (6.141s) 2023-01-11T22:12:51.8009964Z 2023-01-11T22:12:51.8010239Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8010550Z Ran 1 test in 6.141s 2023-01-11T22:12:51.8010711Z 2023-01-11T22:12:51.8010806Z OK 2023-01-11T22:12:51.8010939Z 2023-01-11T22:12:51.8011065Z Generating XML reports... 2023-01-11T22:12:51.8011652Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214928.xml 2023-01-11T22:12:51.8012372Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8012820Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8013608Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8014068Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8014298Z 2023-01-11T22:12:51.8014410Z Running tests... 2023-01-11T22:12:51.8014815Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8015326Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8015889Z test_DistributedDataParallel_SyncBatchNorm_No_Affine (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8016431Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21954 2023-01-11T22:12:51.8016972Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21955 2023-01-11T22:12:51.8017568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8018017Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8018660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8019146Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8019713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8020160Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8020732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8021184Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8021638Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8022133Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8022790Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8023459Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8023979Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8024510Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8054126Z [1673473781.546066] [7e0e28e30a97:21954:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8054746Z [1673473781.552691] [7e0e28e30a97:21954:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8055215Z [1673473781.552691] [7e0e28e30a97:21954:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8055735Z [1673473781.546340] [7e0e28e30a97:21955:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8056242Z [1673473781.551495] [7e0e28e30a97:21955:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8056710Z [1673473781.551495] [7e0e28e30a97:21955:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8057174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8057662Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8058146Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8058628Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8058960Z ok (5.933s) 2023-01-11T22:12:51.8059108Z 2023-01-11T22:12:51.8059430Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8059767Z Ran 1 test in 5.933s 2023-01-11T22:12:51.8059911Z 2023-01-11T22:12:51.8060008Z OK 2023-01-11T22:12:51.8060143Z 2023-01-11T22:12:51.8060271Z Generating XML reports... 2023-01-11T22:12:51.8060881Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214936.xml 2023-01-11T22:12:51.8061579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8062201Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8062786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8063261Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8063495Z 2023-01-11T22:12:51.8063588Z Running tests... 2023-01-11T22:12:51.8064060Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8064617Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8065205Z test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8065747Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22072 2023-01-11T22:12:51.8066197Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22073 2023-01-11T22:12:51.8066812Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8067246Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8067824Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8068294Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8068872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8069294Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8069861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8070321Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8070759Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8071254Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8071905Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8072593Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8073100Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8073576Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8074093Z [1673473790.009158] [7e0e28e30a97:22072:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8074601Z [1673473790.016414] [7e0e28e30a97:22072:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8075060Z [1673473790.016414] [7e0e28e30a97:22072:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8075569Z [1673473790.016770] [7e0e28e30a97:22073:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8076069Z [1673473790.023401] [7e0e28e30a97:22073:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8076536Z [1673473790.023401] [7e0e28e30a97:22073:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8076996Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8077477Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8078044Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8078525Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8078859Z ok (5.546s) 2023-01-11T22:12:51.8079004Z 2023-01-11T22:12:51.8079290Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8079623Z Ran 1 test in 5.546s 2023-01-11T22:12:51.8079786Z 2023-01-11T22:12:51.8079862Z OK 2023-01-11T22:12:51.8080066Z 2023-01-11T22:12:51.8080202Z Generating XML reports... 2023-01-11T22:12:51.8080817Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214945.xml 2023-01-11T22:12:51.8081531Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8081962Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8082532Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8083001Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8083230Z 2023-01-11T22:12:51.8083322Z Running tests... 2023-01-11T22:12:51.8083724Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8084246Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8084797Z test_DistributedDataParallel_non_default_stream (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8085845Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/76428 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.619s) 2023-01-11T22:12:51.8086371Z 2023-01-11T22:12:51.8086643Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8086976Z Ran 1 test in 1.619s 2023-01-11T22:12:51.8087136Z 2023-01-11T22:12:51.8087244Z OK (skipped=1) 2023-01-11T22:12:51.8087381Z 2023-01-11T22:12:51.8087507Z Generating XML reports... 2023-01-11T22:12:51.8088109Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214953.xml 2023-01-11T22:12:51.8088822Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8089272Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8089834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8090299Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8090525Z 2023-01-11T22:12:51.8090633Z Running tests... 2023-01-11T22:12:51.8091026Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8091550Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8092090Z test_DistributedDataParallel_requires_grad (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8092612Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22224 2023-01-11T22:12:51.8093262Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22225 2023-01-11T22:12:51.8093883Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8094334Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8094892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8095355Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8096047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8096495Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8097044Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8097577Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8098053Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8098536Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8099199Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8099878Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8100413Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8100865Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8101200Z ok (4.335s) 2023-01-11T22:12:51.8101349Z 2023-01-11T22:12:51.8101619Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8101948Z Ran 1 test in 4.335s 2023-01-11T22:12:51.8102092Z 2023-01-11T22:12:51.8102187Z OK 2023-01-11T22:12:51.8102322Z 2023-01-11T22:12:51.8102448Z Generating XML reports... 2023-01-11T22:12:51.8103052Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214957.xml 2023-01-11T22:12:51.8103743Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8104196Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8104765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8105229Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8105440Z 2023-01-11T22:12:51.8105551Z Running tests... 2023-01-11T22:12:51.8105950Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8106479Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8107019Z test_DistributedDataParallel_with_amp_and_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8108084Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77294 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.626s) 2023-01-11T22:12:51.8108600Z 2023-01-11T22:12:51.8108867Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8109201Z Ran 1 test in 1.626s 2023-01-11T22:12:51.8109362Z 2023-01-11T22:12:51.8109453Z OK (skipped=1) 2023-01-11T22:12:51.8109609Z 2023-01-11T22:12:51.8109733Z Generating XML reports... 2023-01-11T22:12:51.8110335Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215004.xml 2023-01-11T22:12:51.8111042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8111476Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8112047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8112513Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8112819Z 2023-01-11T22:12:51.8112933Z Running tests... 2023-01-11T22:12:51.8113325Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8113852Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8114378Z test_DistributedSampler_padding (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8114921Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22361 2023-01-11T22:12:51.8115387Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22362 2023-01-11T22:12:51.8116001Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8116453Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8117007Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8117474Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8118047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8118471Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8119041Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8119503Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8119951Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8120429Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8121079Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8121767Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8122284Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8122736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8123248Z [1673473813.219959] [7e0e28e30a97:22362:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8123759Z [1673473813.225909] [7e0e28e30a97:22362:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8124236Z [1673473813.225909] [7e0e28e30a97:22362:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8124732Z [1673473813.212299] [7e0e28e30a97:22361:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8125231Z [1673473813.218994] [7e0e28e30a97:22361:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8125697Z [1673473813.218994] [7e0e28e30a97:22361:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8126035Z ok (5.445s) 2023-01-11T22:12:51.8126166Z 2023-01-11T22:12:51.8126448Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8126813Z Ran 1 test in 5.445s 2023-01-11T22:12:51.8126978Z 2023-01-11T22:12:51.8127075Z OK 2023-01-11T22:12:51.8127193Z 2023-01-11T22:12:51.8127322Z Generating XML reports... 2023-01-11T22:12:51.8127927Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215008.xml 2023-01-11T22:12:51.8128642Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8129166Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8129734Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8130199Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8130431Z 2023-01-11T22:12:51.8130544Z Running tests... 2023-01-11T22:12:51.8130978Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8131527Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8132030Z test_SyncBatchNorm_process_group (__main__.TestDistBackendWithSpawn) ... skip: no torchvision (0.002s) 2023-01-11T22:12:51.8132318Z 2023-01-11T22:12:51.8132582Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8133175Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8133479Z 2023-01-11T22:12:51.8133597Z OK (skipped=1) 2023-01-11T22:12:51.8133756Z 2023-01-11T22:12:51.8133888Z Generating XML reports... 2023-01-11T22:12:51.8134484Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215016.xml 2023-01-11T22:12:51.8135198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8135649Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8136227Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8136677Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8136905Z 2023-01-11T22:12:51.8137017Z Running tests... 2023-01-11T22:12:51.8137420Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8137926Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8138383Z test_accumulate_gradients_no_sync (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.8138886Z Runs _test_accumulate_gradients_no_sync using default inputs ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T22:12:51.8139192Z 2023-01-11T22:12:51.8139462Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8139775Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8139935Z 2023-01-11T22:12:51.8140043Z OK (skipped=1) 2023-01-11T22:12:51.8140198Z 2023-01-11T22:12:51.8140321Z Generating XML reports... 2023-01-11T22:12:51.8140905Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215018.xml 2023-01-11T22:12:51.8141613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8142063Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8142641Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8143092Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8143322Z 2023-01-11T22:12:51.8143430Z Running tests... 2023-01-11T22:12:51.8143830Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8144342Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8144816Z test_accumulate_gradients_no_sync_allreduce_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.8145335Z Runs multiple iterations on _test_accumulate_gradients_no_sync ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T22:12:51.8145639Z 2023-01-11T22:12:51.8145905Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8146349Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8146512Z 2023-01-11T22:12:51.8146622Z OK (skipped=1) 2023-01-11T22:12:51.8146778Z 2023-01-11T22:12:51.8146904Z Generating XML reports... 2023-01-11T22:12:51.8147512Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215021.xml 2023-01-11T22:12:51.8148204Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8148720Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8149320Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8149773Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8150002Z 2023-01-11T22:12:51.8150111Z Running tests... 2023-01-11T22:12:51.8150514Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8151040Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8151515Z test_accumulate_gradients_no_sync_allreduce_with_then_hook (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.8152076Z Runs multiple iterations on _test_accumulate_gradients_no_sync using allreduce ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T22:12:51.8152402Z 2023-01-11T22:12:51.8152667Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8152996Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8153140Z 2023-01-11T22:12:51.8153250Z OK (skipped=1) 2023-01-11T22:12:51.8153403Z 2023-01-11T22:12:51.8153526Z Generating XML reports... 2023-01-11T22:12:51.8154127Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215023.xml 2023-01-11T22:12:51.8154816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8155268Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8155839Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8156304Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8156514Z 2023-01-11T22:12:51.8156623Z Running tests... 2023-01-11T22:12:51.8157032Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8157555Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8158011Z test_accumulate_gradients_no_sync_grad_is_view (__main__.TestDistBackendWithSpawn) 2023-01-11T22:12:51.8158526Z Runs _test_accumulate_gradients_no_sync using default inputs ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T22:12:51.8158825Z 2023-01-11T22:12:51.8159130Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8159472Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8159617Z 2023-01-11T22:12:51.8159725Z OK (skipped=1) 2023-01-11T22:12:51.8159879Z 2023-01-11T22:12:51.8160002Z Generating XML reports... 2023-01-11T22:12:51.8160604Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215026.xml 2023-01-11T22:12:51.8161298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8161743Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8162313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8162784Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8163013Z 2023-01-11T22:12:51.8163106Z Running tests... 2023-01-11T22:12:51.8163507Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8164122Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8164596Z test_all_gather (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8165078Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22640 2023-01-11T22:12:51.8165525Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22641 2023-01-11T22:12:51.8166177Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8166621Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8167203Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8167672Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8168254Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8168681Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8169252Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8169716Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8170154Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8170649Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8171304Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8171992Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8172499Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8173691Z STAGE:2023-01-11 21:50:32 22640:22640 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8174180Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8174772Z STAGE:2023-01-11 21:50:32 22641:22641 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8175281Z [1673473832.357900] [7e0e28e30a97:22640:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8175797Z [1673473833.400859] [7e0e28e30a97:22640:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8176270Z [1673473833.400859] [7e0e28e30a97:22640:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8176786Z [1673473832.382031] [7e0e28e30a97:22641:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8177265Z [1673473833.417432] [7e0e28e30a97:22641:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8177729Z [1673473833.417432] [7e0e28e30a97:22641:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8178524Z STAGE:2023-01-11 21:50:33 22640:22640 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:50:33 22641:22641 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8178918Z 2023-01-11T22:12:51.8179272Z STAGE:2023-01-11 21:50:33 22641:22641 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8179857Z STAGE:2023-01-11 21:50:33 22640:22640 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8180566Z STAGE:2023-01-11 21:50:33 22641:22641 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8181132Z STAGE:2023-01-11 21:50:33 22640:22640 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8181714Z STAGE:2023-01-11 21:50:33 22641:22641 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8182332Z STAGE:2023-01-11 21:50:33 22640:22640 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8182940Z STAGE:2023-01-11 21:50:33 22641:22641 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8183544Z STAGE:2023-01-11 21:50:33 22640:22640 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8183876Z ok (5.838s) 2023-01-11T22:12:51.8184025Z 2023-01-11T22:12:51.8184290Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8184614Z Ran 1 test in 5.838s 2023-01-11T22:12:51.8184780Z 2023-01-11T22:12:51.8184874Z OK 2023-01-11T22:12:51.8184990Z 2023-01-11T22:12:51.8185114Z Generating XML reports... 2023-01-11T22:12:51.8185716Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215028.xml 2023-01-11T22:12:51.8186430Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8186868Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8187438Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8187901Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8188131Z 2023-01-11T22:12:51.8188240Z Running tests... 2023-01-11T22:12:51.8188627Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8189149Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8189687Z test_all_gather_coalesced_complex (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T22:12:51.8190002Z 2023-01-11T22:12:51.8190265Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8190572Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8190732Z 2023-01-11T22:12:51.8190838Z OK (skipped=1) 2023-01-11T22:12:51.8190992Z 2023-01-11T22:12:51.8191118Z Generating XML reports... 2023-01-11T22:12:51.8191699Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215036.xml 2023-01-11T22:12:51.8192408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8192854Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8193426Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8193881Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8194112Z 2023-01-11T22:12:51.8194219Z Running tests... 2023-01-11T22:12:51.8194623Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8195128Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8195669Z test_all_gather_coalesced_full_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T22:12:51.8195985Z 2023-01-11T22:12:51.8196248Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8196569Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8196712Z 2023-01-11T22:12:51.8196820Z OK (skipped=1) 2023-01-11T22:12:51.8196975Z 2023-01-11T22:12:51.8197102Z Generating XML reports... 2023-01-11T22:12:51.8197700Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215039.xml 2023-01-11T22:12:51.8198465Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8198920Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8199495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8200024Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8200265Z 2023-01-11T22:12:51.8200357Z Running tests... 2023-01-11T22:12:51.8200766Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8201290Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8201819Z test_all_gather_coalesced_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T22:12:51.8202117Z 2023-01-11T22:12:51.8202379Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8202699Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8202858Z 2023-01-11T22:12:51.8202965Z OK (skipped=1) 2023-01-11T22:12:51.8203127Z 2023-01-11T22:12:51.8203233Z Generating XML reports... 2023-01-11T22:12:51.8203837Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215041.xml 2023-01-11T22:12:51.8204542Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8204998Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8205551Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8206019Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8206249Z 2023-01-11T22:12:51.8206358Z Running tests... 2023-01-11T22:12:51.8206746Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8207270Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8207806Z test_all_gather_coalesced_simple (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T22:12:51.8208118Z 2023-01-11T22:12:51.8208384Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8208688Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8208847Z 2023-01-11T22:12:51.8208954Z OK (skipped=1) 2023-01-11T22:12:51.8209107Z 2023-01-11T22:12:51.8209229Z Generating XML reports... 2023-01-11T22:12:51.8209806Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215043.xml 2023-01-11T22:12:51.8210514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8210695Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8211073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8211263Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8211282Z 2023-01-11T22:12:51.8211391Z Running tests... 2023-01-11T22:12:51.8211657Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8211970Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8212244Z test_all_gather_coalesced_with_empty (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support all_gather_coalesced (0.002s) 2023-01-11T22:12:51.8212263Z 2023-01-11T22:12:51.8212525Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8212636Z Ran 1 test in 0.003s 2023-01-11T22:12:51.8212714Z 2023-01-11T22:12:51.8212831Z OK (skipped=1) 2023-01-11T22:12:51.8212849Z 2023-01-11T22:12:51.8213179Z Generating XML reports... 2023-01-11T22:12:51.8213640Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215046.xml 2023-01-11T22:12:51.8214013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8214269Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8214671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8214844Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8214865Z 2023-01-11T22:12:51.8214976Z Running tests... 2023-01-11T22:12:51.8215241Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8215554Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8215820Z test_all_gather_complex (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8216041Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22919 2023-01-11T22:12:51.8216258Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22920 2023-01-11T22:12:51.8216638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8216797Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8217176Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8217367Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8217731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8217907Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8218285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8218473Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8218718Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8218961Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8219344Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8219738Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8219966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8220306Z STAGE:2023-01-11 21:50:52 22919:22919 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8220530Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8220864Z STAGE:2023-01-11 21:50:52 22920:22920 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8221146Z [1673473852.737501] [7e0e28e30a97:22920:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8221382Z [1673473853.777990] [7e0e28e30a97:22920:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8221621Z [1673473853.777990] [7e0e28e30a97:22920:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8221948Z STAGE:2023-01-11 21:50:54 22920:22920 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8222303Z [1673473852.717163] [7e0e28e30a97:22919:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8222535Z [1673473853.773719] [7e0e28e30a97:22919:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8222814Z [1673473853.773719] [7e0e28e30a97:22919:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8223170Z STAGE:2023-01-11 21:50:54 22919:22919 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8223523Z STAGE:2023-01-11 21:50:54 22920:22920 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8223870Z STAGE:2023-01-11 21:50:54 22919:22919 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8224198Z STAGE:2023-01-11 21:50:54 22920:22920 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8224519Z STAGE:2023-01-11 21:50:54 22919:22919 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8224831Z STAGE:2023-01-11 21:50:54 22920:22920 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8225155Z STAGE:2023-01-11 21:50:54 22919:22919 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8225501Z STAGE:2023-01-11 21:50:54 22920:22920 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8225842Z STAGE:2023-01-11 21:50:54 22919:22919 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8225945Z ok (5.851s) 2023-01-11T22:12:51.8225965Z 2023-01-11T22:12:51.8226234Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8226347Z Ran 1 test in 5.851s 2023-01-11T22:12:51.8226366Z 2023-01-11T22:12:51.8226458Z OK 2023-01-11T22:12:51.8226481Z 2023-01-11T22:12:51.8226606Z Generating XML reports... 2023-01-11T22:12:51.8227035Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215048.xml 2023-01-11T22:12:51.8227437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8227615Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8227999Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8228196Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8228215Z 2023-01-11T22:12:51.8228325Z Running tests... 2023-01-11T22:12:51.8228588Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8228896Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8229161Z test_all_gather_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all gather (0.002s) 2023-01-11T22:12:51.8229182Z 2023-01-11T22:12:51.8229422Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8229535Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8229554Z 2023-01-11T22:12:51.8229662Z OK (skipped=1) 2023-01-11T22:12:51.8229682Z 2023-01-11T22:12:51.8229806Z Generating XML reports... 2023-01-11T22:12:51.8230255Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215057.xml 2023-01-11T22:12:51.8230623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8230800Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8231178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8231417Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8231457Z 2023-01-11T22:12:51.8231549Z Running tests... 2023-01-11T22:12:51.8231821Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8232130Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8232454Z test_all_gather_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all gather (0.002s) 2023-01-11T22:12:51.8232476Z 2023-01-11T22:12:51.8232751Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8232866Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8232886Z 2023-01-11T22:12:51.8232995Z OK (skipped=1) 2023-01-11T22:12:51.8233014Z 2023-01-11T22:12:51.8233140Z Generating XML reports... 2023-01-11T22:12:51.8233567Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215059.xml 2023-01-11T22:12:51.8233944Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8234119Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8234494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8234686Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8234709Z 2023-01-11T22:12:51.8234818Z Running tests... 2023-01-11T22:12:51.8235081Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8235389Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8235648Z test_all_gather_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8235851Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23099 2023-01-11T22:12:51.8236072Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23100 2023-01-11T22:12:51.8236440Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8236615Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8236991Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8237185Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8237547Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8237720Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8238075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8238262Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8238512Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8238759Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8239157Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8239552Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8239780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8240007Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8240246Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.8240467Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.8240937Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8241329Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8241711Z STAGE:2023-01-11 21:51:05 23100:23100 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8242045Z STAGE:2023-01-11 21:51:05 23099:23099 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8242326Z [1673473865.825533] [7e0e28e30a97:23100:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8242562Z [1673473866.850551] [7e0e28e30a97:23100:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8242807Z [1673473866.850551] [7e0e28e30a97:23100:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8243079Z [1673473865.805232] [7e0e28e30a97:23099:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8243310Z [1673473866.839808] [7e0e28e30a97:23099:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8243531Z [1673473866.839808] [7e0e28e30a97:23099:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8244085Z STAGE:2023-01-11 21:51:07 23100:23100 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:51:07 23099:23099 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8244107Z 2023-01-11T22:12:51.8244676Z STAGE:2023-01-11 21:51:07 23100:23100 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:51:07 23099:23099 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8244699Z 2023-01-11T22:12:51.8245028Z STAGE:2023-01-11 21:51:07 23099:23099 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8245351Z STAGE:2023-01-11 21:51:07 23100:23100 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8245689Z STAGE:2023-01-11 21:51:07 23099:23099 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8246026Z STAGE:2023-01-11 21:51:07 23100:23100 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8246373Z STAGE:2023-01-11 21:51:07 23099:23099 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8246719Z STAGE:2023-01-11 21:51:07 23100:23100 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8246822Z ok (5.757s) 2023-01-11T22:12:51.8246847Z 2023-01-11T22:12:51.8247113Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8247209Z Ran 1 test in 5.758s 2023-01-11T22:12:51.8247228Z 2023-01-11T22:12:51.8247321Z OK 2023-01-11T22:12:51.8247340Z 2023-01-11T22:12:51.8247464Z Generating XML reports... 2023-01-11T22:12:51.8247912Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215101.xml 2023-01-11T22:12:51.8248285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8248463Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8248845Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8249038Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8249058Z 2023-01-11T22:12:51.8249150Z Running tests... 2023-01-11T22:12:51.8249482Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8249796Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8250054Z test_all_gather_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8250276Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23213 2023-01-11T22:12:51.8250540Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23214 2023-01-11T22:12:51.8250927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8251106Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8251473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8251632Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8252015Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8252206Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8252583Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8252778Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8253411Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8253688Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8254101Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8254478Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8254717Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8254945Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8255102Z skip: Skipped due to small world size. (4.343s) 2023-01-11T22:12:51.8255123Z 2023-01-11T22:12:51.8255390Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8255506Z Ran 1 test in 4.343s 2023-01-11T22:12:51.8255525Z 2023-01-11T22:12:51.8255633Z OK (skipped=1) 2023-01-11T22:12:51.8255652Z 2023-01-11T22:12:51.8255776Z Generating XML reports... 2023-01-11T22:12:51.8256221Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215110.xml 2023-01-11T22:12:51.8256574Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8256754Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8257131Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8257321Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8257341Z 2023-01-11T22:12:51.8257453Z Running tests... 2023-01-11T22:12:51.8257719Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8258036Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8258335Z test_all_gather_into_cat_tensor_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_gather_into_tensor (0.002s) 2023-01-11T22:12:51.8258355Z 2023-01-11T22:12:51.8258616Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8258710Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8258729Z 2023-01-11T22:12:51.8258940Z OK (skipped=1) 2023-01-11T22:12:51.8258959Z 2023-01-11T22:12:51.8259087Z Generating XML reports... 2023-01-11T22:12:51.8259541Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215117.xml 2023-01-11T22:12:51.8259915Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8260094Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8260536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8260740Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8260760Z 2023-01-11T22:12:51.8260870Z Running tests... 2023-01-11T22:12:51.8261125Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8261437Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8261743Z test_all_gather_into_stack_tensor_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_gather_into_tensor (0.002s) 2023-01-11T22:12:51.8261763Z 2023-01-11T22:12:51.8262018Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8262132Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8262152Z 2023-01-11T22:12:51.8262264Z OK (skipped=1) 2023-01-11T22:12:51.8262283Z 2023-01-11T22:12:51.8262417Z Generating XML reports... 2023-01-11T22:12:51.8262856Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215119.xml 2023-01-11T22:12:51.8263207Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8263382Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8263757Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8263950Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8263969Z 2023-01-11T22:12:51.8264077Z Running tests... 2023-01-11T22:12:51.8264339Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8264652Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8264940Z test_all_gather_multigpu (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl backend supports allgather multigpu (0.002s) 2023-01-11T22:12:51.8264959Z 2023-01-11T22:12:51.8265218Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8265311Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8265346Z 2023-01-11T22:12:51.8265437Z OK (skipped=1) 2023-01-11T22:12:51.8265455Z 2023-01-11T22:12:51.8265578Z Generating XML reports... 2023-01-11T22:12:51.8266017Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215121.xml 2023-01-11T22:12:51.8266390Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8266567Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8266945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8267137Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8267157Z 2023-01-11T22:12:51.8267265Z Running tests... 2023-01-11T22:12:51.8267511Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8267821Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8268119Z test_all_gather_multigpu_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl backend supports allgather multigpu (0.002s) 2023-01-11T22:12:51.8268196Z 2023-01-11T22:12:51.8268472Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8268586Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8268605Z 2023-01-11T22:12:51.8268713Z OK (skipped=1) 2023-01-11T22:12:51.8268732Z 2023-01-11T22:12:51.8268858Z Generating XML reports... 2023-01-11T22:12:51.8269302Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215124.xml 2023-01-11T22:12:51.8269731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8269897Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8270280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8270472Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8270492Z 2023-01-11T22:12:51.8270610Z Running tests... 2023-01-11T22:12:51.8270872Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8271182Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8271453Z test_all_gather_object_default_pg (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8271674Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23448 2023-01-11T22:12:51.8271877Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23449 2023-01-11T22:12:51.8272248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8272423Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8272799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8272993Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8273358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8273529Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8273906Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8274096Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8274325Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8274568Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8274968Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8275362Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8275595Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8275822Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8276102Z [1673473890.605584] [7e0e28e30a97:23448:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8276336Z [1673473891.401360] [7e0e28e30a97:23448:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8276578Z [1673473891.401360] [7e0e28e30a97:23448:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8276834Z [1673473890.626044] [7e0e28e30a97:23449:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8277127Z [1673473891.390101] [7e0e28e30a97:23449:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8277366Z [1673473891.390101] [7e0e28e30a97:23449:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8277474Z ok (5.948s) 2023-01-11T22:12:51.8277494Z 2023-01-11T22:12:51.8277817Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8277939Z Ran 1 test in 5.948s 2023-01-11T22:12:51.8277958Z 2023-01-11T22:12:51.8278054Z OK 2023-01-11T22:12:51.8278072Z 2023-01-11T22:12:51.8278196Z Generating XML reports... 2023-01-11T22:12:51.8278648Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215126.xml 2023-01-11T22:12:51.8279000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8279183Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8279563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8279754Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8279773Z 2023-01-11T22:12:51.8279881Z Running tests... 2023-01-11T22:12:51.8280146Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8280456Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8280728Z test_all_gather_object_subgroup (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8280947Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23559 2023-01-11T22:12:51.8281148Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23560 2023-01-11T22:12:51.8281516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8281696Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8282072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8282263Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8282627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8282799Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8283170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8283342Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8283584Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8283829Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8284228Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8284622Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8284853Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8285082Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8285320Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.8285559Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.8285937Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8286395Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8286635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:12:51.8286923Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:12:51.8287328Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.8287721Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.8287962Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:12:51.8288201Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:12:51.8288590Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:12:51.8288980Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:12:51.8289242Z [1673473899.093179] [7e0e28e30a97:23559:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8289475Z [1673473899.869873] [7e0e28e30a97:23559:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8289716Z [1673473899.869873] [7e0e28e30a97:23559:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8289988Z [1673473899.114914] [7e0e28e30a97:23560:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8290225Z [1673473899.895906] [7e0e28e30a97:23560:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8290461Z [1673473899.895906] [7e0e28e30a97:23560:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8290564Z ok (6.435s) 2023-01-11T22:12:51.8290584Z 2023-01-11T22:12:51.8290856Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8290970Z Ran 1 test in 6.435s 2023-01-11T22:12:51.8290990Z 2023-01-11T22:12:51.8291065Z OK 2023-01-11T22:12:51.8291101Z 2023-01-11T22:12:51.8291208Z Generating XML reports... 2023-01-11T22:12:51.8291655Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215135.xml 2023-01-11T22:12:51.8292022Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8292203Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8292583Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8292774Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8292794Z 2023-01-11T22:12:51.8293050Z Running tests... 2023-01-11T22:12:51.8293332Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8293629Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8293893Z test_all_gather_v_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports all_gather_v (0.002s) 2023-01-11T22:12:51.8293913Z 2023-01-11T22:12:51.8294173Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8294285Z Ran 1 test in 0.003s 2023-01-11T22:12:51.8294304Z 2023-01-11T22:12:51.8294503Z OK (skipped=1) 2023-01-11T22:12:51.8294523Z 2023-01-11T22:12:51.8294653Z Generating XML reports... 2023-01-11T22:12:51.8295105Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215144.xml 2023-01-11T22:12:51.8295475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8295654Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8296074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8296278Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8296298Z 2023-01-11T22:12:51.8296408Z Running tests... 2023-01-11T22:12:51.8296677Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8296988Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8297409Z test_all_reduce_coalesced_full_group_max (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.8297430Z 2023-01-11T22:12:51.8297684Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8297796Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8297815Z 2023-01-11T22:12:51.8297905Z OK (skipped=1) 2023-01-11T22:12:51.8297942Z 2023-01-11T22:12:51.8298052Z Generating XML reports... 2023-01-11T22:12:51.8298494Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215146.xml 2023-01-11T22:12:51.8298863Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8299038Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8299415Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8299610Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8299629Z 2023-01-11T22:12:51.8299737Z Running tests... 2023-01-11T22:12:51.8299997Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8300289Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8300707Z test_all_reduce_coalesced_full_group_min (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.8300727Z 2023-01-11T22:12:51.8300984Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8301095Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8301115Z 2023-01-11T22:12:51.8301221Z OK (skipped=1) 2023-01-11T22:12:51.8301241Z 2023-01-11T22:12:51.8301364Z Generating XML reports... 2023-01-11T22:12:51.8301802Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215148.xml 2023-01-11T22:12:51.8302175Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8302351Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8302709Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8302905Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8302924Z 2023-01-11T22:12:51.8303030Z Running tests... 2023-01-11T22:12:51.8303292Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8303603Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8304029Z test_all_reduce_coalesced_full_group_product (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.8304107Z 2023-01-11T22:12:51.8304380Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8304494Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8304513Z 2023-01-11T22:12:51.8304623Z OK (skipped=1) 2023-01-11T22:12:51.8304642Z 2023-01-11T22:12:51.8304747Z Generating XML reports... 2023-01-11T22:12:51.8305240Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215151.xml 2023-01-11T22:12:51.8305623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8305800Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8306176Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8306368Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8306391Z 2023-01-11T22:12:51.8306501Z Running tests... 2023-01-11T22:12:51.8306761Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8307055Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8307472Z test_all_reduce_coalesced_full_group_sum (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.8307491Z 2023-01-11T22:12:51.8307750Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8307860Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8307879Z 2023-01-11T22:12:51.8307986Z OK (skipped=1) 2023-01-11T22:12:51.8308005Z 2023-01-11T22:12:51.8308128Z Generating XML reports... 2023-01-11T22:12:51.8308571Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215153.xml 2023-01-11T22:12:51.8308938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8309118Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8309477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8309667Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8309687Z 2023-01-11T22:12:51.8309795Z Running tests... 2023-01-11T22:12:51.8310061Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8310371Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8310780Z test_all_reduce_coalesced_group_max (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.8310801Z 2023-01-11T22:12:51.8311060Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8311170Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8311192Z 2023-01-11T22:12:51.8311299Z OK (skipped=1) 2023-01-11T22:12:51.8311318Z 2023-01-11T22:12:51.8311424Z Generating XML reports... 2023-01-11T22:12:51.8311863Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215156.xml 2023-01-11T22:12:51.8312230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8312410Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8312786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8312977Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8312997Z 2023-01-11T22:12:51.8313106Z Running tests... 2023-01-11T22:12:51.8313369Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8313749Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8314140Z test_all_reduce_coalesced_group_min (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.8314160Z 2023-01-11T22:12:51.8314421Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8314534Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8314554Z 2023-01-11T22:12:51.8314748Z OK (skipped=1) 2023-01-11T22:12:51.8314769Z 2023-01-11T22:12:51.8314899Z Generating XML reports... 2023-01-11T22:12:51.8315351Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215158.xml 2023-01-11T22:12:51.8315721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8315901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8316277Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8316455Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8316475Z 2023-01-11T22:12:51.8316583Z Running tests... 2023-01-11T22:12:51.8316846Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8317157Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8317574Z test_all_reduce_coalesced_group_product (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.8317594Z 2023-01-11T22:12:51.8317849Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8317962Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8317981Z 2023-01-11T22:12:51.8318087Z OK (skipped=1) 2023-01-11T22:12:51.8318106Z 2023-01-11T22:12:51.8318229Z Generating XML reports... 2023-01-11T22:12:51.8318655Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215200.xml 2023-01-11T22:12:51.8319023Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8319198Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8319577Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8319769Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8319788Z 2023-01-11T22:12:51.8319896Z Running tests... 2023-01-11T22:12:51.8320157Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8320465Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8320854Z test_all_reduce_coalesced_group_sum (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.8320894Z 2023-01-11T22:12:51.8321136Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8321248Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8321267Z 2023-01-11T22:12:51.8321372Z OK (skipped=1) 2023-01-11T22:12:51.8321391Z 2023-01-11T22:12:51.8321514Z Generating XML reports... 2023-01-11T22:12:51.8321959Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215203.xml 2023-01-11T22:12:51.8322326Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8322501Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8322876Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8323049Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8323143Z 2023-01-11T22:12:51.8323239Z Running tests... 2023-01-11T22:12:51.8323508Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8323818Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8324214Z test_all_reduce_coalesced_max (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.8324281Z 2023-01-11T22:12:51.8324552Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8324667Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8324689Z 2023-01-11T22:12:51.8324799Z OK (skipped=1) 2023-01-11T22:12:51.8324818Z 2023-01-11T22:12:51.8324942Z Generating XML reports... 2023-01-11T22:12:51.8325362Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215205.xml 2023-01-11T22:12:51.8325730Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8325908Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8326284Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8326476Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8326496Z 2023-01-11T22:12:51.8326607Z Running tests... 2023-01-11T22:12:51.8326868Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8327177Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8327454Z test_all_reduce_coalesced_max_complex_unsupported (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8327702Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24021 2023-01-11T22:12:51.8327925Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24022 2023-01-11T22:12:51.8328297Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8328475Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8328852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8329043Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8329403Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8329578Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8329931Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8330120Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8330370Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8330613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8331009Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8331405Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8331633Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8332377Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:12:51.8332566Z warnings.warn( 2023-01-11T22:12:51.8332776Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8334013Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:12:51.8334146Z warnings.warn( 2023-01-11T22:12:51.8334250Z ok (4.249s) 2023-01-11T22:12:51.8334271Z 2023-01-11T22:12:51.8334550Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8334664Z Ran 1 test in 4.249s 2023-01-11T22:12:51.8334684Z 2023-01-11T22:12:51.8334779Z OK 2023-01-11T22:12:51.8334799Z 2023-01-11T22:12:51.8334923Z Generating XML reports... 2023-01-11T22:12:51.8335371Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215208.xml 2023-01-11T22:12:51.8335729Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8335906Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8336286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8336481Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8336501Z 2023-01-11T22:12:51.8336609Z Running tests... 2023-01-11T22:12:51.8336874Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8337182Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8337575Z test_all_reduce_coalesced_min (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.8337599Z 2023-01-11T22:12:51.8337859Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8337955Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8337974Z 2023-01-11T22:12:51.8338082Z OK (skipped=1) 2023-01-11T22:12:51.8338100Z 2023-01-11T22:12:51.8338223Z Generating XML reports... 2023-01-11T22:12:51.8338662Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215214.xml 2023-01-11T22:12:51.8339032Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8339207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8339581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8339773Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8339793Z 2023-01-11T22:12:51.8339901Z Running tests... 2023-01-11T22:12:51.8340150Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8340458Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8340859Z test_all_reduce_coalesced_product (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.8340878Z 2023-01-11T22:12:51.8341144Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8341255Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8341275Z 2023-01-11T22:12:51.8341381Z OK (skipped=1) 2023-01-11T22:12:51.8341400Z 2023-01-11T22:12:51.8341523Z Generating XML reports... 2023-01-11T22:12:51.8341964Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215217.xml 2023-01-11T22:12:51.8342333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8342578Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8342966Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8343157Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8343177Z 2023-01-11T22:12:51.8343287Z Running tests... 2023-01-11T22:12:51.8343595Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8343920Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8344319Z test_all_reduce_coalesced_sum (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.8344339Z 2023-01-11T22:12:51.8344598Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8344692Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8344728Z 2023-01-11T22:12:51.8344822Z OK (skipped=1) 2023-01-11T22:12:51.8344842Z 2023-01-11T22:12:51.8344965Z Generating XML reports... 2023-01-11T22:12:51.8345405Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215219.xml 2023-01-11T22:12:51.8345773Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8345953Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8346335Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8346528Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8346547Z 2023-01-11T22:12:51.8346656Z Running tests... 2023-01-11T22:12:51.8346901Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8347211Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8347499Z test_all_reduce_complex_unsupported_ops (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8347721Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24223 2023-01-11T22:12:51.8347939Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24224 2023-01-11T22:12:51.8348310Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8348485Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8348861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8349034Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8349396Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8349574Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8349947Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8350135Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8350381Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8350627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8351024Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8351418Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8351630Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8351922Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8352028Z ok (4.336s) 2023-01-11T22:12:51.8352048Z 2023-01-11T22:12:51.8352319Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8352430Z Ran 1 test in 4.336s 2023-01-11T22:12:51.8352449Z 2023-01-11T22:12:51.8352541Z OK 2023-01-11T22:12:51.8352560Z 2023-01-11T22:12:51.8352687Z Generating XML reports... 2023-01-11T22:12:51.8353181Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215222.xml 2023-01-11T22:12:51.8353562Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8353721Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8354099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8354294Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8354313Z 2023-01-11T22:12:51.8354420Z Running tests... 2023-01-11T22:12:51.8354684Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8354994Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8355268Z test_all_reduce_full_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8355489Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24326 2023-01-11T22:12:51.8355689Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24327 2023-01-11T22:12:51.8356058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8356234Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8356617Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8356807Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8357164Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8357339Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8357714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8357902Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8358133Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8358375Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8358769Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8359167Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8359396Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8359639Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.8359861Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8360095Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.8360491Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8360863Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8361262Z STAGE:2023-01-11 21:52:32 24327:24327 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8361585Z STAGE:2023-01-11 21:52:32 24326:24326 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8361866Z [1673473952.879454] [7e0e28e30a97:24327:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8362146Z [1673473953.942112] [7e0e28e30a97:24327:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8362395Z [1673473953.942112] [7e0e28e30a97:24327:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8362671Z [1673473952.856614] [7e0e28e30a97:24326:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8362910Z [1673473953.898230] [7e0e28e30a97:24326:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8363149Z [1673473953.898230] [7e0e28e30a97:24326:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8363708Z STAGE:2023-01-11 21:52:34 24327:24327 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:52:34 24326:24326 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8363730Z 2023-01-11T22:12:51.8364298Z STAGE:2023-01-11 21:52:34 24327:24327 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:52:34 24326:24326 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8364319Z 2023-01-11T22:12:51.8364633Z STAGE:2023-01-11 21:52:34 24326:24326 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8364955Z STAGE:2023-01-11 21:52:34 24327:24327 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8365292Z STAGE:2023-01-11 21:52:34 24326:24326 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8365618Z STAGE:2023-01-11 21:52:34 24327:24327 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8365964Z STAGE:2023-01-11 21:52:34 24326:24326 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8366315Z STAGE:2023-01-11 21:52:34 24327:24327 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8366419Z ok (5.944s) 2023-01-11T22:12:51.8366438Z 2023-01-11T22:12:51.8366703Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8366817Z Ran 1 test in 5.945s 2023-01-11T22:12:51.8366836Z 2023-01-11T22:12:51.8366910Z OK 2023-01-11T22:12:51.8366929Z 2023-01-11T22:12:51.8367054Z Generating XML reports... 2023-01-11T22:12:51.8367505Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215228.xml 2023-01-11T22:12:51.8367880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8368055Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8368434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8368630Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8368648Z 2023-01-11T22:12:51.8368759Z Running tests... 2023-01-11T22:12:51.8369020Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8369312Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8369582Z test_all_reduce_full_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8369867Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24440 2023-01-11T22:12:51.8370085Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24441 2023-01-11T22:12:51.8370461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8370638Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8371064Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8371264Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8371617Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8371793Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8372167Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8372361Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8372605Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8372845Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8373658Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8374063Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8374296Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8374519Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.8374745Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8374989Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.8375382Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8375716Z STAGE:2023-01-11 21:52:41 24440:24440 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8376113Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8376446Z STAGE:2023-01-11 21:52:41 24441:24441 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8376722Z [1673473961.434720] [7e0e28e30a97:24441:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8376956Z [1673473962.452473] [7e0e28e30a97:24441:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8377183Z [1673473962.452473] [7e0e28e30a97:24441:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8377457Z [1673473961.414025] [7e0e28e30a97:24440:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8377690Z [1673473962.480321] [7e0e28e30a97:24440:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8377922Z [1673473962.480321] [7e0e28e30a97:24440:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8378468Z STAGE:2023-01-11 21:52:42 24441:24441 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:52:42 24440:24440 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8378577Z 2023-01-11T22:12:51.8379162Z STAGE:2023-01-11 21:52:42 24440:24440 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:52:42 24441:24441 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8379183Z 2023-01-11T22:12:51.8379508Z STAGE:2023-01-11 21:52:42 24441:24441 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8379889Z STAGE:2023-01-11 21:52:42 24440:24440 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8380236Z STAGE:2023-01-11 21:52:42 24441:24441 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8380568Z STAGE:2023-01-11 21:52:42 24440:24440 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8380914Z STAGE:2023-01-11 21:52:42 24441:24441 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8381255Z STAGE:2023-01-11 21:52:42 24440:24440 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8381348Z ok (5.875s) 2023-01-11T22:12:51.8381367Z 2023-01-11T22:12:51.8381631Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8381742Z Ran 1 test in 5.875s 2023-01-11T22:12:51.8381762Z 2023-01-11T22:12:51.8381855Z OK 2023-01-11T22:12:51.8381874Z 2023-01-11T22:12:51.8381999Z Generating XML reports... 2023-01-11T22:12:51.8382453Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215237.xml 2023-01-11T22:12:51.8382825Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8382999Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8383365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8383553Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8383575Z 2023-01-11T22:12:51.8383681Z Running tests... 2023-01-11T22:12:51.8383941Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8384251Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8384521Z test_all_reduce_full_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8384741Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24554 2023-01-11T22:12:51.8384956Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24555 2023-01-11T22:12:51.8385320Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8385478Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8385850Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8386040Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8386397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8386565Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8386935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8387119Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8387358Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8387583Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8387977Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8388441Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8388668Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8388906Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.8389174Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8389418Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.8389814Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8390195Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8390509Z STAGE:2023-01-11 21:52:49 24554:24554 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8390834Z STAGE:2023-01-11 21:52:49 24555:24555 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8391103Z [1673473969.770373] [7e0e28e30a97:24554:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8391331Z [1673473970.841218] [7e0e28e30a97:24554:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8391562Z [1673473970.841218] [7e0e28e30a97:24554:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8391827Z [1673473969.773073] [7e0e28e30a97:24555:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8392046Z [1673473970.828278] [7e0e28e30a97:24555:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8392276Z [1673473970.828278] [7e0e28e30a97:24555:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8392813Z STAGE:2023-01-11 21:52:51 24554:24554 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:52:51 24555:24555 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8392835Z 2023-01-11T22:12:51.8393396Z STAGE:2023-01-11 21:52:51 24554:24554 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:52:51 24555:24555 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8393417Z 2023-01-11T22:12:51.8393734Z STAGE:2023-01-11 21:52:51 24554:24554 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8394046Z STAGE:2023-01-11 21:52:51 24555:24555 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8394367Z STAGE:2023-01-11 21:52:51 24554:24554 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8394688Z STAGE:2023-01-11 21:52:51 24555:24555 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8395021Z STAGE:2023-01-11 21:52:51 24554:24554 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8395354Z STAGE:2023-01-11 21:52:51 24555:24555 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8395450Z ok (5.931s) 2023-01-11T22:12:51.8395470Z 2023-01-11T22:12:51.8395728Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8395830Z Ran 1 test in 5.931s 2023-01-11T22:12:51.8395851Z 2023-01-11T22:12:51.8395934Z OK 2023-01-11T22:12:51.8395953Z 2023-01-11T22:12:51.8396060Z Generating XML reports... 2023-01-11T22:12:51.8396501Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215245.xml 2023-01-11T22:12:51.8396933Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8397101Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8397475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8397656Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8397736Z 2023-01-11T22:12:51.8397842Z Running tests... 2023-01-11T22:12:51.8398102Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8398402Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8398655Z test_all_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8398867Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24668 2023-01-11T22:12:51.8399079Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24669 2023-01-11T22:12:51.8399438Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8399602Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8399971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8400151Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8400502Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8400658Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8401021Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8401202Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8401438Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8401672Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8402060Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8402447Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8402675Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8402905Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.8403114Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8403349Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.8403734Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8404118Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8404448Z STAGE:2023-01-11 21:52:58 24668:24668 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8404767Z STAGE:2023-01-11 21:52:58 24669:24669 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8405034Z [1673473978.259894] [7e0e28e30a97:24669:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8405260Z [1673473979.306877] [7e0e28e30a97:24669:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8405553Z [1673473979.306877] [7e0e28e30a97:24669:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8405815Z [1673473978.252341] [7e0e28e30a97:24668:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8406071Z [1673473979.303914] [7e0e28e30a97:24668:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8406302Z [1673473979.303914] [7e0e28e30a97:24668:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8406851Z STAGE:2023-01-11 21:52:59 24669:24669 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:52:59 24668:24668 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8406872Z 2023-01-11T22:12:51.8407439Z STAGE:2023-01-11 21:52:59 24668:24668 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:52:59 24669:24669 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8407463Z 2023-01-11T22:12:51.8407787Z STAGE:2023-01-11 21:52:59 24668:24668 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8408101Z STAGE:2023-01-11 21:52:59 24669:24669 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8408432Z STAGE:2023-01-11 21:52:59 24668:24668 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8408762Z STAGE:2023-01-11 21:52:59 24669:24669 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8409102Z STAGE:2023-01-11 21:52:59 24668:24668 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8409440Z STAGE:2023-01-11 21:52:59 24669:24669 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8409541Z ok (5.932s) 2023-01-11T22:12:51.8409566Z 2023-01-11T22:12:51.8409815Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8409926Z Ran 1 test in 5.932s 2023-01-11T22:12:51.8409946Z 2023-01-11T22:12:51.8410035Z OK 2023-01-11T22:12:51.8410054Z 2023-01-11T22:12:51.8410176Z Generating XML reports... 2023-01-11T22:12:51.8410624Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215254.xml 2023-01-11T22:12:51.8410997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8411171Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8411538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8411713Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8411740Z 2023-01-11T22:12:51.8411838Z Running tests... 2023-01-11T22:12:51.8412092Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8412397Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8412652Z test_all_reduce_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8413017Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24782 2023-01-11T22:12:51.8413243Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24783 2023-01-11T22:12:51.8413616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8413789Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8414153Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8414340Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8414803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8414979Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8415356Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8415601Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8415857Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8416100Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8416487Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8416879Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8417104Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8417325Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8417476Z skip: Skipped due to small world size. (4.245s) 2023-01-11T22:12:51.8417496Z 2023-01-11T22:12:51.8417757Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8417862Z Ran 1 test in 4.245s 2023-01-11T22:12:51.8417882Z 2023-01-11T22:12:51.8417980Z OK (skipped=1) 2023-01-11T22:12:51.8417999Z 2023-01-11T22:12:51.8418114Z Generating XML reports... 2023-01-11T22:12:51.8418545Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215302.xml 2023-01-11T22:12:51.8418906Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8419076Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8419445Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8419629Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8419648Z 2023-01-11T22:12:51.8419747Z Running tests... 2023-01-11T22:12:51.8420007Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8420309Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8420562Z test_all_reduce_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8420764Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24885 2023-01-11T22:12:51.8420973Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24886 2023-01-11T22:12:51.8421341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8421507Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8421876Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8422058Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8422415Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8422579Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8422937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8423117Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8423353Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8423658Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8424057Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8424446Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8424715Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8424939Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8425089Z skip: Skipped due to small world size. (4.225s) 2023-01-11T22:12:51.8425110Z 2023-01-11T22:12:51.8425363Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8425466Z Ran 1 test in 4.225s 2023-01-11T22:12:51.8425489Z 2023-01-11T22:12:51.8425589Z OK (skipped=1) 2023-01-11T22:12:51.8425608Z 2023-01-11T22:12:51.8425723Z Generating XML reports... 2023-01-11T22:12:51.8426158Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215309.xml 2023-01-11T22:12:51.8426517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8426689Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8427057Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8427241Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8427261Z 2023-01-11T22:12:51.8427352Z Running tests... 2023-01-11T22:12:51.8427607Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8427908Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8428198Z test_all_reduce_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8428412Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24988 2023-01-11T22:12:51.8428621Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24989 2023-01-11T22:12:51.8428988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8429154Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8429512Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8429692Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8430045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8430216Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8430580Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8430756Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8430993Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8431230Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8431619Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8431996Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8432215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8432499Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8432651Z skip: Skipped due to small world size. (4.241s) 2023-01-11T22:12:51.8432672Z 2023-01-11T22:12:51.8432935Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8433038Z Ran 1 test in 4.242s 2023-01-11T22:12:51.8433058Z 2023-01-11T22:12:51.8433160Z OK (skipped=1) 2023-01-11T22:12:51.8433180Z 2023-01-11T22:12:51.8433344Z Generating XML reports... 2023-01-11T22:12:51.8433788Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215316.xml 2023-01-11T22:12:51.8434148Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8434316Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8434686Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8434875Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8434896Z 2023-01-11T22:12:51.8434996Z Running tests... 2023-01-11T22:12:51.8435255Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8435557Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8435815Z test_all_reduce_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8436018Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25091 2023-01-11T22:12:51.8436225Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25092 2023-01-11T22:12:51.8436585Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8436751Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8437122Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8437302Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8437654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8437820Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8438183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8438353Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8438588Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8438820Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8439216Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8439604Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8439822Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8440043Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8440194Z skip: Skipped due to small world size. (4.220s) 2023-01-11T22:12:51.8440215Z 2023-01-11T22:12:51.8440473Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8440571Z Ran 1 test in 4.220s 2023-01-11T22:12:51.8440590Z 2023-01-11T22:12:51.8440686Z OK (skipped=1) 2023-01-11T22:12:51.8440705Z 2023-01-11T22:12:51.8440820Z Generating XML reports... 2023-01-11T22:12:51.8441260Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215323.xml 2023-01-11T22:12:51.8441697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8441865Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8442233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8442464Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8442486Z 2023-01-11T22:12:51.8442582Z Running tests... 2023-01-11T22:12:51.8442844Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8443149Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8443398Z test_all_reduce_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8443611Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25194 2023-01-11T22:12:51.8443821Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25195 2023-01-11T22:12:51.8444179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8444348Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8444711Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8444896Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8445249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8445414Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8445783Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8445967Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8446201Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8446432Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8446827Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8447202Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8447422Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8447748Z STAGE:2023-01-11 21:53:33 25194:25194 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8447971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8448303Z STAGE:2023-01-11 21:53:33 25195:25195 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8448576Z [1673474013.800230] [7e0e28e30a97:25194:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8448808Z [1673474014.838291] [7e0e28e30a97:25194:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8449040Z [1673474014.838291] [7e0e28e30a97:25194:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8449307Z [1673474013.801845] [7e0e28e30a97:25195:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8449534Z [1673474014.848239] [7e0e28e30a97:25195:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8449820Z [1673474014.848239] [7e0e28e30a97:25195:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8450382Z STAGE:2023-01-11 21:53:35 25194:25194 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:53:35 25195:25195 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8450403Z 2023-01-11T22:12:51.8451015Z STAGE:2023-01-11 21:53:35 25194:25194 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:53:35 25195:25195 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8451037Z 2023-01-11T22:12:51.8451368Z STAGE:2023-01-11 21:53:35 25194:25194 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8451684Z STAGE:2023-01-11 21:53:35 25195:25195 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8452017Z STAGE:2023-01-11 21:53:35 25194:25194 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8452348Z STAGE:2023-01-11 21:53:35 25195:25195 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8452687Z STAGE:2023-01-11 21:53:35 25194:25194 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8453421Z STAGE:2023-01-11 21:53:35 25195:25195 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8453543Z ok (5.839s) 2023-01-11T22:12:51.8453564Z 2023-01-11T22:12:51.8453836Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8453934Z Ran 1 test in 5.840s 2023-01-11T22:12:51.8453954Z 2023-01-11T22:12:51.8454040Z OK 2023-01-11T22:12:51.8454059Z 2023-01-11T22:12:51.8454180Z Generating XML reports... 2023-01-11T22:12:51.8454621Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215329.xml 2023-01-11T22:12:51.8454983Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8455147Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8455515Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8455698Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8455718Z 2023-01-11T22:12:51.8455812Z Running tests... 2023-01-11T22:12:51.8456068Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8456370Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8456612Z test_all_reduce_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8456827Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25308 2023-01-11T22:12:51.8457042Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25309 2023-01-11T22:12:51.8457407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8457576Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8457938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8458123Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8458481Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8458651Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8459025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8459208Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8459585Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8459863Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8460272Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8460727Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8460966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8461305Z STAGE:2023-01-11 21:53:42 25308:25308 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8461529Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8461855Z STAGE:2023-01-11 21:53:42 25309:25309 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8462133Z [1673474022.172624] [7e0e28e30a97:25309:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8462360Z [1673474023.225215] [7e0e28e30a97:25309:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8462595Z [1673474023.225215] [7e0e28e30a97:25309:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8462862Z [1673474022.172599] [7e0e28e30a97:25308:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8463084Z [1673474023.218606] [7e0e28e30a97:25308:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8463303Z [1673474023.218606] [7e0e28e30a97:25308:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8463857Z STAGE:2023-01-11 21:53:43 25309:25309 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:53:43 25308:25308 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8463879Z 2023-01-11T22:12:51.8464449Z STAGE:2023-01-11 21:53:43 25309:25309 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:53:43 25308:25308 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8464469Z 2023-01-11T22:12:51.8464795Z STAGE:2023-01-11 21:53:43 25308:25308 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8465110Z STAGE:2023-01-11 21:53:43 25309:25309 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8465439Z STAGE:2023-01-11 21:53:43 25308:25308 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8465766Z STAGE:2023-01-11 21:53:43 25309:25309 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8466105Z STAGE:2023-01-11 21:53:43 25308:25308 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8466444Z STAGE:2023-01-11 21:53:43 25309:25309 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8466540Z ok (5.832s) 2023-01-11T22:12:51.8466560Z 2023-01-11T22:12:51.8466822Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8466920Z Ran 1 test in 5.832s 2023-01-11T22:12:51.8466939Z 2023-01-11T22:12:51.8467029Z OK 2023-01-11T22:12:51.8467048Z 2023-01-11T22:12:51.8467169Z Generating XML reports... 2023-01-11T22:12:51.8467615Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215338.xml 2023-01-11T22:12:51.8467982Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8468222Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8468612Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8468799Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8468819Z 2023-01-11T22:12:51.8468911Z Running tests... 2023-01-11T22:12:51.8469212Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8469531Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8469793Z test_all_reduce_multigpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8470009Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25422 2023-01-11T22:12:51.8470223Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25423 2023-01-11T22:12:51.8470596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8470768Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8471128Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8471312Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8471673Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8471846Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8472217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8472399Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8472639Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8472883Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8473279Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8473654Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8473880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8474106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8474442Z STAGE:2023-01-11 21:53:51 25423:25423 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8475202Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T22:12:51.8475318Z warnings.warn( 2023-01-11T22:12:51.8475642Z STAGE:2023-01-11 21:53:51 25422:25422 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8476399Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T22:12:51.8476513Z warnings.warn( 2023-01-11T22:12:51.8476786Z [1673474031.486853] [7e0e28e30a97:25422:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8477002Z [1673474031.496405] [7e0e28e30a97:25422:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8477315Z [1673474031.496405] [7e0e28e30a97:25422:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8477669Z STAGE:2023-01-11 21:53:51 25422:25422 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8477981Z [1673474031.488921] [7e0e28e30a97:25423:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8478211Z [1673474031.498195] [7e0e28e30a97:25423:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8478447Z [1673474031.498195] [7e0e28e30a97:25423:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8478794Z STAGE:2023-01-11 21:53:51 25423:25423 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8479143Z STAGE:2023-01-11 21:53:51 25422:25422 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8479499Z STAGE:2023-01-11 21:53:51 25423:25423 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8480029Z STAGE:2023-01-11 21:53:51 25423:25423 ActivityProfilerController.cpp:300] Completed Stage: Warm UpSTAGE:2023-01-11 21:53:51 25422:25422 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8480050Z 2023-01-11T22:12:51.8480383Z STAGE:2023-01-11 21:53:51 25422:25422 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8480696Z STAGE:2023-01-11 21:53:51 25423:25423 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8481041Z STAGE:2023-01-11 21:53:51 25422:25422 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8481379Z STAGE:2023-01-11 21:53:51 25423:25423 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8481485Z ok (5.820s) 2023-01-11T22:12:51.8481505Z 2023-01-11T22:12:51.8481770Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8481881Z Ran 1 test in 5.820s 2023-01-11T22:12:51.8481901Z 2023-01-11T22:12:51.8481993Z OK 2023-01-11T22:12:51.8482012Z 2023-01-11T22:12:51.8482133Z Generating XML reports... 2023-01-11T22:12:51.8482565Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215346.xml 2023-01-11T22:12:51.8482938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8483115Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8483494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8483680Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8483704Z 2023-01-11T22:12:51.8483813Z Running tests... 2023-01-11T22:12:51.8484078Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8484388Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8484662Z test_all_reduce_multigpu_complex (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8484870Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25540 2023-01-11T22:12:51.8485092Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25541 2023-01-11T22:12:51.8485461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8485637Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8486012Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8486269Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8486647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8486822Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8487176Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8487406Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8487658Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8487902Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8488310Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8488713Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8488938Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8489160Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8489494Z STAGE:2023-01-11 21:53:59 25541:25541 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8490261Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T22:12:51.8490360Z warnings.warn( 2023-01-11T22:12:51.8490694Z STAGE:2023-01-11 21:53:59 25540:25540 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8491456Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1582: UserWarning: torch.distributed.all_reduce_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T22:12:51.8491570Z warnings.warn( 2023-01-11T22:12:51.8491846Z [1673474039.814761] [7e0e28e30a97:25540:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8492079Z [1673474039.824298] [7e0e28e30a97:25540:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8492317Z [1673474039.824298] [7e0e28e30a97:25540:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8492649Z STAGE:2023-01-11 21:54:00 25540:25540 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8493100Z [1673474039.816907] [7e0e28e30a97:25541:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8493347Z [1673474039.826189] [7e0e28e30a97:25541:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8493569Z [1673474039.826189] [7e0e28e30a97:25541:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8493923Z STAGE:2023-01-11 21:54:00 25541:25541 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8494274Z STAGE:2023-01-11 21:54:00 25540:25540 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8494620Z STAGE:2023-01-11 21:54:00 25541:25541 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8495148Z STAGE:2023-01-11 21:54:00 25540:25540 ActivityProfilerController.cpp:300] Completed Stage: Warm UpSTAGE:2023-01-11 21:54:00 25541:25541 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8495252Z 2023-01-11T22:12:51.8495606Z STAGE:2023-01-11 21:54:00 25540:25540 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8495938Z STAGE:2023-01-11 21:54:00 25541:25541 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8496284Z STAGE:2023-01-11 21:54:00 25540:25540 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8496680Z STAGE:2023-01-11 21:54:00 25541:25541 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8496788Z ok (5.813s) 2023-01-11T22:12:51.8496809Z 2023-01-11T22:12:51.8497062Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8497178Z Ran 1 test in 5.813s 2023-01-11T22:12:51.8497198Z 2023-01-11T22:12:51.8497294Z OK 2023-01-11T22:12:51.8497314Z 2023-01-11T22:12:51.8497443Z Generating XML reports... 2023-01-11T22:12:51.8497898Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215354.xml 2023-01-11T22:12:51.8498269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8498444Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8498827Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8499004Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8499039Z 2023-01-11T22:12:51.8499131Z Running tests... 2023-01-11T22:12:51.8499393Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8499704Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8499961Z test_all_reduce_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8500184Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25658 2023-01-11T22:12:51.8500399Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25659 2023-01-11T22:12:51.8500765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8500935Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8501303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8501489Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8501854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8502028Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8502401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8502587Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8502834Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8503071Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8503456Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8503853Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8504082Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8504413Z STAGE:2023-01-11 21:54:07 25658:25658 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8504705Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8505044Z STAGE:2023-01-11 21:54:07 25659:25659 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8505318Z [1673474047.276364] [7e0e28e30a97:25658:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8505599Z [1673474048.328087] [7e0e28e30a97:25658:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8505850Z [1673474048.328087] [7e0e28e30a97:25658:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8506191Z STAGE:2023-01-11 21:54:08 25658:25658 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8506448Z [1673474047.277880] [7e0e28e30a97:25659:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8506681Z [1673474048.328323] [7e0e28e30a97:25659:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8506918Z [1673474048.328323] [7e0e28e30a97:25659:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8507255Z STAGE:2023-01-11 21:54:08 25659:25659 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8507601Z STAGE:2023-01-11 21:54:08 25658:25658 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8507944Z STAGE:2023-01-11 21:54:08 25659:25659 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8508267Z STAGE:2023-01-11 21:54:08 25659:25659 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8508582Z STAGE:2023-01-11 21:54:08 25658:25658 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8509123Z STAGE:2023-01-11 21:54:08 25659:25659 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:54:08 25658:25658 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8509145Z 2023-01-11T22:12:51.8509709Z STAGE:2023-01-11 21:54:08 25659:25659 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:54:08 25658:25658 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8509732Z 2023-01-11T22:12:51.8509829Z ok (5.837s) 2023-01-11T22:12:51.8509849Z 2023-01-11T22:12:51.8510096Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8510204Z Ran 1 test in 5.837s 2023-01-11T22:12:51.8510224Z 2023-01-11T22:12:51.8510314Z OK 2023-01-11T22:12:51.8510333Z 2023-01-11T22:12:51.8510459Z Generating XML reports... 2023-01-11T22:12:51.8510901Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215403.xml 2023-01-11T22:12:51.8511275Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8511449Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8511823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8512001Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8512038Z 2023-01-11T22:12:51.8512131Z Running tests... 2023-01-11T22:12:51.8512392Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8512696Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8512963Z test_all_reduce_result_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8513177Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25772 2023-01-11T22:12:51.8513454Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25773 2023-01-11T22:12:51.8513831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8514002Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8514408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8514602Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8514972Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8515144Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8515513Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8515704Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8515950Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8516188Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8516569Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8516966Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8517192Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8517413Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8517690Z [1673474056.339876] [7e0e28e30a97:25772:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8517925Z [1673474056.346506] [7e0e28e30a97:25772:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8518159Z [1673474056.346506] [7e0e28e30a97:25772:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8518430Z [1673474056.345557] [7e0e28e30a97:25773:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8518661Z [1673474056.351916] [7e0e28e30a97:25773:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8518893Z [1673474056.351916] [7e0e28e30a97:25773:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8518979Z ok (5.419s) 2023-01-11T22:12:51.8519012Z 2023-01-11T22:12:51.8519267Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8519385Z Ran 1 test in 5.419s 2023-01-11T22:12:51.8519405Z 2023-01-11T22:12:51.8519494Z OK 2023-01-11T22:12:51.8519513Z 2023-01-11T22:12:51.8519634Z Generating XML reports... 2023-01-11T22:12:51.8520083Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215411.xml 2023-01-11T22:12:51.8520452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8520626Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8521003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8521177Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8521197Z 2023-01-11T22:12:51.8521302Z Running tests... 2023-01-11T22:12:51.8521560Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8521951Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8522202Z test_all_reduce_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8522420Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25886 2023-01-11T22:12:51.8522638Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25887 2023-01-11T22:12:51.8523071Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8523236Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8523622Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8523815Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8524183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8524351Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8524727Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8524912Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8525160Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8525401Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8525784Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8526178Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8526406Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8526742Z STAGE:2023-01-11 21:54:23 25887:25887 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8526967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8527295Z STAGE:2023-01-11 21:54:23 25886:25886 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8527571Z [1673474063.619111] [7e0e28e30a97:25887:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8527799Z [1673474064.640076] [7e0e28e30a97:25887:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8528038Z [1673474064.640076] [7e0e28e30a97:25887:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8528296Z [1673474063.596193] [7e0e28e30a97:25886:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8528521Z [1673474064.641361] [7e0e28e30a97:25886:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8528779Z [1673474064.641361] [7e0e28e30a97:25886:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8529343Z STAGE:2023-01-11 21:54:25 25887:25887 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:54:25 25886:25886 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8529366Z 2023-01-11T22:12:51.8529708Z STAGE:2023-01-11 21:54:25 25887:25887 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8530050Z STAGE:2023-01-11 21:54:25 25886:25886 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8530447Z STAGE:2023-01-11 21:54:25 25886:25886 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8530764Z STAGE:2023-01-11 21:54:25 25887:25887 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8531098Z STAGE:2023-01-11 21:54:25 25886:25886 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8531470Z STAGE:2023-01-11 21:54:25 25887:25887 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8531810Z STAGE:2023-01-11 21:54:25 25886:25886 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8532149Z STAGE:2023-01-11 21:54:25 25887:25887 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8532250Z ok (5.825s) 2023-01-11T22:12:51.8532269Z 2023-01-11T22:12:51.8532536Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8532644Z Ran 1 test in 5.825s 2023-01-11T22:12:51.8532668Z 2023-01-11T22:12:51.8532757Z OK 2023-01-11T22:12:51.8532777Z 2023-01-11T22:12:51.8533111Z Generating XML reports... 2023-01-11T22:12:51.8533726Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215419.xml 2023-01-11T22:12:51.8534097Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8534262Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8534640Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8534833Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8534853Z 2023-01-11T22:12:51.8534956Z Running tests... 2023-01-11T22:12:51.8535220Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8535527Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8535787Z test_all_reduce_sum_async (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8536008Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26000 2023-01-11T22:12:51.8536210Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26001 2023-01-11T22:12:51.8536579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8536753Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8537129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8537314Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8537672Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8537848Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8538221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8538404Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8538632Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8538876Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8539269Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8539665Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8539892Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8540331Z STAGE:2023-01-11 21:54:31 26000:26000 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8540555Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8540881Z STAGE:2023-01-11 21:54:31 26001:26001 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8541217Z [1673474071.913381] [7e0e28e30a97:26000:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8541440Z [1673474072.966828] [7e0e28e30a97:26000:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8541676Z [1673474072.966828] [7e0e28e30a97:26000:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8541945Z [1673474071.937408] [7e0e28e30a97:26001:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8542179Z [1673474072.960312] [7e0e28e30a97:26001:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8542408Z [1673474072.960312] [7e0e28e30a97:26001:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8542962Z STAGE:2023-01-11 21:54:33 26000:26000 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:54:33 26001:26001 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8542983Z 2023-01-11T22:12:51.8543335Z STAGE:2023-01-11 21:54:33 26001:26001 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8543675Z STAGE:2023-01-11 21:54:33 26000:26000 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8543996Z STAGE:2023-01-11 21:54:33 26001:26001 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8544317Z STAGE:2023-01-11 21:54:33 26000:26000 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8544641Z STAGE:2023-01-11 21:54:33 26001:26001 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8544948Z STAGE:2023-01-11 21:54:33 26000:26000 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8545294Z STAGE:2023-01-11 21:54:33 26001:26001 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8545631Z STAGE:2023-01-11 21:54:33 26000:26000 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8545728Z ok (5.822s) 2023-01-11T22:12:51.8545747Z 2023-01-11T22:12:51.8546011Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8546122Z Ran 1 test in 5.822s 2023-01-11T22:12:51.8546141Z 2023-01-11T22:12:51.8546229Z OK 2023-01-11T22:12:51.8546248Z 2023-01-11T22:12:51.8546371Z Generating XML reports... 2023-01-11T22:12:51.8546800Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215428.xml 2023-01-11T22:12:51.8547165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8547336Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8547719Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8547906Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8547926Z 2023-01-11T22:12:51.8548029Z Running tests... 2023-01-11T22:12:51.8548287Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8548590Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8548853Z test_all_reduce_sum_complex (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8549121Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26114 2023-01-11T22:12:51.8549337Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26115 2023-01-11T22:12:51.8549709Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8549924Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8550312Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8550500Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8550859Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8551025Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8551389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8551575Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8551814Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8552052Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8552455Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8552851Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8553080Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8553411Z STAGE:2023-01-11 21:54:40 26115:26115 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8553641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8553953Z STAGE:2023-01-11 21:54:40 26114:26114 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8554227Z [1673474080.314068] [7e0e28e30a97:26114:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8554462Z [1673474081.373516] [7e0e28e30a97:26114:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8554699Z [1673474081.373516] [7e0e28e30a97:26114:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8555034Z STAGE:2023-01-11 21:54:41 26114:26114 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8555305Z [1673474080.336560] [7e0e28e30a97:26115:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8555537Z [1673474081.370864] [7e0e28e30a97:26115:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8555774Z [1673474081.370864] [7e0e28e30a97:26115:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8556113Z STAGE:2023-01-11 21:54:41 26115:26115 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8556459Z STAGE:2023-01-11 21:54:41 26114:26114 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8556786Z STAGE:2023-01-11 21:54:41 26115:26115 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8557113Z STAGE:2023-01-11 21:54:41 26114:26114 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8557434Z STAGE:2023-01-11 21:54:41 26115:26115 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8557840Z STAGE:2023-01-11 21:54:41 26114:26114 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8558166Z STAGE:2023-01-11 21:54:41 26115:26115 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8558512Z STAGE:2023-01-11 21:54:41 26114:26114 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8558897Z STAGE:2023-01-11 21:54:41 26115:26115 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8559010Z ok (5.833s) 2023-01-11T22:12:51.8559030Z 2023-01-11T22:12:51.8559300Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8559396Z Ran 1 test in 5.833s 2023-01-11T22:12:51.8559416Z 2023-01-11T22:12:51.8559510Z OK 2023-01-11T22:12:51.8559529Z 2023-01-11T22:12:51.8559655Z Generating XML reports... 2023-01-11T22:12:51.8560100Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215436.xml 2023-01-11T22:12:51.8560473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8560652Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8561034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8561226Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8561246Z 2023-01-11T22:12:51.8561356Z Running tests... 2023-01-11T22:12:51.8561602Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8561914Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8562211Z test_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and NCCL backends will have CUDA allReduce tested (0.002s) 2023-01-11T22:12:51.8562234Z 2023-01-11T22:12:51.8562492Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8562604Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8562623Z 2023-01-11T22:12:51.8562731Z OK (skipped=1) 2023-01-11T22:12:51.8562750Z 2023-01-11T22:12:51.8562875Z Generating XML reports... 2023-01-11T22:12:51.8563318Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215444.xml 2023-01-11T22:12:51.8563674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8563851Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8564230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8564423Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8564446Z 2023-01-11T22:12:51.8564556Z Running tests... 2023-01-11T22:12:51.8564816Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8565126Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8565432Z test_all_reduce_sum_cuda_async (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and NCCL backends will have CUDA allReduce tested (0.002s) 2023-01-11T22:12:51.8565452Z 2023-01-11T22:12:51.8565713Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8565808Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8565828Z 2023-01-11T22:12:51.8565933Z OK (skipped=1) 2023-01-11T22:12:51.8565952Z 2023-01-11T22:12:51.8566074Z Generating XML reports... 2023-01-11T22:12:51.8566513Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215447.xml 2023-01-11T22:12:51.8566879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8567115Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8567498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8567688Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8567707Z 2023-01-11T22:12:51.8567816Z Running tests... 2023-01-11T22:12:51.8568107Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8568430Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8568740Z test_all_reduce_sum_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and NCCL backends will have CUDA allReduce tested (0.002s) 2023-01-11T22:12:51.8568760Z 2023-01-11T22:12:51.8569019Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8569136Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8569156Z 2023-01-11T22:12:51.8569264Z OK (skipped=1) 2023-01-11T22:12:51.8569284Z 2023-01-11T22:12:51.8569407Z Generating XML reports... 2023-01-11T22:12:51.8569845Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215449.xml 2023-01-11T22:12:51.8570209Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8570369Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8570743Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8570936Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8570955Z 2023-01-11T22:12:51.8571064Z Running tests... 2023-01-11T22:12:51.8571322Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8571635Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8571881Z test_all_to_all (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T22:12:51.8571900Z 2023-01-11T22:12:51.8572157Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8572251Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8572286Z 2023-01-11T22:12:51.8572376Z OK (skipped=1) 2023-01-11T22:12:51.8572398Z 2023-01-11T22:12:51.8572521Z Generating XML reports... 2023-01-11T22:12:51.8573425Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215451.xml 2023-01-11T22:12:51.8573885Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8574064Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8574442Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8574645Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8574666Z 2023-01-11T22:12:51.8574774Z Running tests... 2023-01-11T22:12:51.8575019Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8575334Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8575597Z test_all_to_all_complex (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T22:12:51.8575617Z 2023-01-11T22:12:51.8575876Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8575986Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8576005Z 2023-01-11T22:12:51.8576116Z OK (skipped=1) 2023-01-11T22:12:51.8576136Z 2023-01-11T22:12:51.8576257Z Generating XML reports... 2023-01-11T22:12:51.8576695Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215454.xml 2023-01-11T22:12:51.8577172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8577331Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8577707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8577966Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8577989Z 2023-01-11T22:12:51.8578105Z Running tests... 2023-01-11T22:12:51.8578372Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8578683Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8578944Z test_all_to_all_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL supports CUDA all_to_all (0.002s) 2023-01-11T22:12:51.8578967Z 2023-01-11T22:12:51.8579231Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8579342Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8579361Z 2023-01-11T22:12:51.8579452Z OK (skipped=1) 2023-01-11T22:12:51.8579471Z 2023-01-11T22:12:51.8579592Z Generating XML reports... 2023-01-11T22:12:51.8580034Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215456.xml 2023-01-11T22:12:51.8580405Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8580579Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8580956Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8581146Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8581165Z 2023-01-11T22:12:51.8581277Z Running tests... 2023-01-11T22:12:51.8581520Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8581826Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8582094Z test_all_to_all_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL supports CUDA all_to_all (0.002s) 2023-01-11T22:12:51.8582114Z 2023-01-11T22:12:51.8582376Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8582488Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8582507Z 2023-01-11T22:12:51.8582612Z OK (skipped=1) 2023-01-11T22:12:51.8582631Z 2023-01-11T22:12:51.8582754Z Generating XML reports... 2023-01-11T22:12:51.8583191Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215459.xml 2023-01-11T22:12:51.8583557Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8583717Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8584091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8584281Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8584300Z 2023-01-11T22:12:51.8584407Z Running tests... 2023-01-11T22:12:51.8584669Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8584977Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8585236Z test_all_to_all_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T22:12:51.8585256Z 2023-01-11T22:12:51.8585512Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8585622Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8585641Z 2023-01-11T22:12:51.8585792Z OK (skipped=1) 2023-01-11T22:12:51.8585812Z 2023-01-11T22:12:51.8585938Z Generating XML reports... 2023-01-11T22:12:51.8586386Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215501.xml 2023-01-11T22:12:51.8586754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8586989Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8587380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8587572Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8587591Z 2023-01-11T22:12:51.8587700Z Running tests... 2023-01-11T22:12:51.8587945Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8588253Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8588529Z test_all_to_all_full_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL supports CUDA all_to_all (0.002s) 2023-01-11T22:12:51.8588548Z 2023-01-11T22:12:51.8588806Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8588917Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8588937Z 2023-01-11T22:12:51.8589044Z OK (skipped=1) 2023-01-11T22:12:51.8589063Z 2023-01-11T22:12:51.8589189Z Generating XML reports... 2023-01-11T22:12:51.8589628Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215503.xml 2023-01-11T22:12:51.8589996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8590153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8590528Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8590721Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8590741Z 2023-01-11T22:12:51.8590847Z Running tests... 2023-01-11T22:12:51.8591111Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8591421Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8591676Z test_all_to_all_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports all_to_all (0.002s) 2023-01-11T22:12:51.8591696Z 2023-01-11T22:12:51.8591956Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8592067Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8592086Z 2023-01-11T22:12:51.8592176Z OK (skipped=1) 2023-01-11T22:12:51.8592194Z 2023-01-11T22:12:51.8592316Z Generating XML reports... 2023-01-11T22:12:51.8592754Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215506.xml 2023-01-11T22:12:51.8593127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8593300Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8593675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8593868Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8593888Z 2023-01-11T22:12:51.8593992Z Running tests... 2023-01-11T22:12:51.8594238Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8594548Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8594822Z test_all_to_all_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:12:51.8594841Z 2023-01-11T22:12:51.8595175Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8595289Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8595309Z 2023-01-11T22:12:51.8595418Z OK (skipped=1) 2023-01-11T22:12:51.8595437Z 2023-01-11T22:12:51.8595560Z Generating XML reports... 2023-01-11T22:12:51.8595995Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215508.xml 2023-01-11T22:12:51.8596440Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8596609Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8596993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8597186Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8597205Z 2023-01-11T22:12:51.8597315Z Running tests... 2023-01-11T22:12:51.8597582Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8597889Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8598169Z test_all_to_all_single_equal_split (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:12:51.8598189Z 2023-01-11T22:12:51.8598445Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8598558Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8598577Z 2023-01-11T22:12:51.8598667Z OK (skipped=1) 2023-01-11T22:12:51.8598702Z 2023-01-11T22:12:51.8598809Z Generating XML reports... 2023-01-11T22:12:51.8599247Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215510.xml 2023-01-11T22:12:51.8599614Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8599795Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8600170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8600358Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8600377Z 2023-01-11T22:12:51.8600486Z Running tests... 2023-01-11T22:12:51.8600753Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8601043Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8601336Z test_all_to_all_single_equal_split_complex (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:12:51.8601355Z 2023-01-11T22:12:51.8601610Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8601721Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8601740Z 2023-01-11T22:12:51.8601849Z OK (skipped=1) 2023-01-11T22:12:51.8601868Z 2023-01-11T22:12:51.8601991Z Generating XML reports... 2023-01-11T22:12:51.8602429Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215513.xml 2023-01-11T22:12:51.8602795Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8602968Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8603330Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8603520Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8603539Z 2023-01-11T22:12:51.8603647Z Running tests... 2023-01-11T22:12:51.8603912Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8604219Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8604571Z test_all_to_all_single_equal_split_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:12:51.8604591Z 2023-01-11T22:12:51.8604854Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8604967Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8604986Z 2023-01-11T22:12:51.8605096Z OK (skipped=1) 2023-01-11T22:12:51.8605115Z 2023-01-11T22:12:51.8605265Z Generating XML reports... 2023-01-11T22:12:51.8605716Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215515.xml 2023-01-11T22:12:51.8606085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8606264Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8606638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8606829Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8606849Z 2023-01-11T22:12:51.8606956Z Running tests... 2023-01-11T22:12:51.8607216Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8607505Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8607807Z test_all_to_all_single_equal_split_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:12:51.8607828Z 2023-01-11T22:12:51.8608088Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8608204Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8608223Z 2023-01-11T22:12:51.8608330Z OK (skipped=1) 2023-01-11T22:12:51.8608349Z 2023-01-11T22:12:51.8608470Z Generating XML reports... 2023-01-11T22:12:51.8608907Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215518.xml 2023-01-11T22:12:51.8609275Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8609448Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8609803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8609995Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8610014Z 2023-01-11T22:12:51.8610122Z Running tests... 2023-01-11T22:12:51.8610382Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8610690Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8610983Z test_all_to_all_single_equal_split_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:12:51.8611007Z 2023-01-11T22:12:51.8611265Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8611375Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8611394Z 2023-01-11T22:12:51.8611498Z OK (skipped=1) 2023-01-11T22:12:51.8611517Z 2023-01-11T22:12:51.8611623Z Generating XML reports... 2023-01-11T22:12:51.8612064Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215520.xml 2023-01-11T22:12:51.8612432Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8612606Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8613179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8613379Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8613478Z 2023-01-11T22:12:51.8613597Z Running tests... 2023-01-11T22:12:51.8613872Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8614164Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8614470Z test_all_to_all_single_equal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:12:51.8614490Z 2023-01-11T22:12:51.8614809Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8614931Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8614950Z 2023-01-11T22:12:51.8615058Z OK (skipped=1) 2023-01-11T22:12:51.8615077Z 2023-01-11T22:12:51.8615199Z Generating XML reports... 2023-01-11T22:12:51.8615648Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215522.xml 2023-01-11T22:12:51.8616018Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8616197Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8616575Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8616747Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8616766Z 2023-01-11T22:12:51.8616871Z Running tests... 2023-01-11T22:12:51.8617134Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8617442Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8617734Z test_all_to_all_single_equal_split_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:12:51.8617754Z 2023-01-11T22:12:51.8618011Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8618125Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8618145Z 2023-01-11T22:12:51.8618252Z OK (skipped=1) 2023-01-11T22:12:51.8618271Z 2023-01-11T22:12:51.8618378Z Generating XML reports... 2023-01-11T22:12:51.8618819Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215525.xml 2023-01-11T22:12:51.8619186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8619366Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8619740Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8619930Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8619950Z 2023-01-11T22:12:51.8620058Z Running tests... 2023-01-11T22:12:51.8620321Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8620636Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8620916Z test_all_to_all_single_equal_split_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:12:51.8620952Z 2023-01-11T22:12:51.8621194Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8621303Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8621323Z 2023-01-11T22:12:51.8621432Z OK (skipped=1) 2023-01-11T22:12:51.8621452Z 2023-01-11T22:12:51.8621574Z Generating XML reports... 2023-01-11T22:12:51.8622010Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215527.xml 2023-01-11T22:12:51.8622381Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8622558Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8623015Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8623189Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8623209Z 2023-01-11T22:12:51.8623319Z Running tests... 2023-01-11T22:12:51.8623583Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8623939Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8624234Z test_all_to_all_single_unequal_split (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:12:51.8624254Z 2023-01-11T22:12:51.8624520Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8624634Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8624653Z 2023-01-11T22:12:51.8624760Z OK (skipped=1) 2023-01-11T22:12:51.8624778Z 2023-01-11T22:12:51.8624901Z Generating XML reports... 2023-01-11T22:12:51.8625326Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215529.xml 2023-01-11T22:12:51.8625692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8625865Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8626240Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8626430Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8626450Z 2023-01-11T22:12:51.8626556Z Running tests... 2023-01-11T22:12:51.8626816Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8627125Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8627406Z test_all_to_all_single_unequal_split_complex (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:12:51.8627447Z 2023-01-11T22:12:51.8627691Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8627802Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8627821Z 2023-01-11T22:12:51.8627927Z OK (skipped=1) 2023-01-11T22:12:51.8627946Z 2023-01-11T22:12:51.8628070Z Generating XML reports... 2023-01-11T22:12:51.8628513Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215532.xml 2023-01-11T22:12:51.8628880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8629056Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8629465Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8629645Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8629682Z 2023-01-11T22:12:51.8629774Z Running tests... 2023-01-11T22:12:51.8630035Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8630345Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8630644Z test_all_to_all_single_unequal_split_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:12:51.8630665Z 2023-01-11T22:12:51.8630926Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8631037Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8631056Z 2023-01-11T22:12:51.8631163Z OK (skipped=1) 2023-01-11T22:12:51.8631182Z 2023-01-11T22:12:51.8631305Z Generating XML reports... 2023-01-11T22:12:51.8631726Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215534.xml 2023-01-11T22:12:51.8632162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8632338Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8632715Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8632904Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8632969Z 2023-01-11T22:12:51.8633083Z Running tests... 2023-01-11T22:12:51.8633352Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8633660Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8633962Z test_all_to_all_single_unequal_split_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:12:51.8633982Z 2023-01-11T22:12:51.8634222Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8634337Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8634356Z 2023-01-11T22:12:51.8634463Z OK (skipped=1) 2023-01-11T22:12:51.8634482Z 2023-01-11T22:12:51.8634604Z Generating XML reports... 2023-01-11T22:12:51.8635047Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215537.xml 2023-01-11T22:12:51.8635421Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8635597Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8635971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8636160Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8636180Z 2023-01-11T22:12:51.8636271Z Running tests... 2023-01-11T22:12:51.8636533Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8636845Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8637142Z test_all_to_all_single_unequal_split_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:12:51.8637162Z 2023-01-11T22:12:51.8637418Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8637532Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8637552Z 2023-01-11T22:12:51.8637658Z OK (skipped=1) 2023-01-11T22:12:51.8637676Z 2023-01-11T22:12:51.8637798Z Generating XML reports... 2023-01-11T22:12:51.8638225Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215539.xml 2023-01-11T22:12:51.8638591Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8638771Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8639148Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8639337Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8639356Z 2023-01-11T22:12:51.8639465Z Running tests... 2023-01-11T22:12:51.8639728Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8640039Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8640344Z test_all_to_all_single_unequal_split_full_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:12:51.8640365Z 2023-01-11T22:12:51.8640622Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8640718Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8640737Z 2023-01-11T22:12:51.8640910Z OK (skipped=1) 2023-01-11T22:12:51.8640929Z 2023-01-11T22:12:51.8641052Z Generating XML reports... 2023-01-11T22:12:51.8641499Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215541.xml 2023-01-11T22:12:51.8641863Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8642084Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8642477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8642668Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8642687Z 2023-01-11T22:12:51.8642778Z Running tests... 2023-01-11T22:12:51.8643042Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8643350Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8643650Z test_all_to_all_single_unequal_split_group (__main__.TestDistBackendWithSpawn) ... skip: Only MPI supports CPU all_to_all_single (0.002s) 2023-01-11T22:12:51.8643669Z 2023-01-11T22:12:51.8643929Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8644041Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8644061Z 2023-01-11T22:12:51.8644166Z OK (skipped=1) 2023-01-11T22:12:51.8644185Z 2023-01-11T22:12:51.8644311Z Generating XML reports... 2023-01-11T22:12:51.8644751Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215544.xml 2023-01-11T22:12:51.8645101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8645275Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8645653Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8645846Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8645865Z 2023-01-11T22:12:51.8645973Z Running tests... 2023-01-11T22:12:51.8646236Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8646546Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8646849Z test_all_to_all_single_unequal_split_group_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA all_to_all_single (0.002s) 2023-01-11T22:12:51.8646869Z 2023-01-11T22:12:51.8647126Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8647220Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8647239Z 2023-01-11T22:12:51.8647345Z OK (skipped=1) 2023-01-11T22:12:51.8647366Z 2023-01-11T22:12:51.8647489Z Generating XML reports... 2023-01-11T22:12:51.8647930Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215546.xml 2023-01-11T22:12:51.8648302Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8648479Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8648854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8649046Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8649066Z 2023-01-11T22:12:51.8649174Z Running tests... 2023-01-11T22:12:51.8649418Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8649723Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8649988Z test_average_parameters (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8650279Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27119 2023-01-11T22:12:51.8650498Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27120 2023-01-11T22:12:51.8650877Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8651052Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8651491Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8651671Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8652042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8652218Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8652593Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8652787Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8653463Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8653727Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8654148Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8654541Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8654754Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8654982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8655257Z [1673474154.089918] [7e0e28e30a97:27119:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8655494Z [1673474154.097073] [7e0e28e30a97:27119:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8655732Z [1673474154.097073] [7e0e28e30a97:27119:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8656005Z [1673474154.096952] [7e0e28e30a97:27120:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8656234Z [1673474154.102069] [7e0e28e30a97:27120:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8656469Z [1673474154.102069] [7e0e28e30a97:27120:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8656710Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.8656957Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.8657342Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8657735Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8657843Z ok (6.064s) 2023-01-11T22:12:51.8657864Z 2023-01-11T22:12:51.8658127Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8658238Z Ran 1 test in 6.064s 2023-01-11T22:12:51.8658258Z 2023-01-11T22:12:51.8658350Z OK 2023-01-11T22:12:51.8658368Z 2023-01-11T22:12:51.8658491Z Generating XML reports... 2023-01-11T22:12:51.8658936Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215549.xml 2023-01-11T22:12:51.8659400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8659579Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8659966Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8660160Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8660238Z 2023-01-11T22:12:51.8660357Z Running tests... 2023-01-11T22:12:51.8660625Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8660939Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8661201Z test_backend_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8661416Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27243 2023-01-11T22:12:51.8661623Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27244 2023-01-11T22:12:51.8661990Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8662164Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8662541Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8662735Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8663095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8663267Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8663643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8663816Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8664060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8664302Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8664697Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8665093Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8665322Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8665548Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8665698Z skip: Need at least 3 CUDA devices (4.255s) 2023-01-11T22:12:51.8665718Z 2023-01-11T22:12:51.8665983Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8666082Z Ran 1 test in 4.255s 2023-01-11T22:12:51.8666102Z 2023-01-11T22:12:51.8666208Z OK (skipped=1) 2023-01-11T22:12:51.8666227Z 2023-01-11T22:12:51.8666350Z Generating XML reports... 2023-01-11T22:12:51.8666796Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215557.xml 2023-01-11T22:12:51.8667164Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8667340Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8667720Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8667911Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8667931Z 2023-01-11T22:12:51.8668039Z Running tests... 2023-01-11T22:12:51.8668284Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8668665Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8668920Z test_backend_group (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 3 (0.002s) 2023-01-11T22:12:51.8668940Z 2023-01-11T22:12:51.8669200Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8669314Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8669380Z 2023-01-11T22:12:51.8669494Z OK (skipped=1) 2023-01-11T22:12:51.8669513Z 2023-01-11T22:12:51.8669636Z Generating XML reports... 2023-01-11T22:12:51.8670085Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215604.xml 2023-01-11T22:12:51.8670454Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8670612Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8670990Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8671181Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8671200Z 2023-01-11T22:12:51.8671307Z Running tests... 2023-01-11T22:12:51.8671568Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8671879Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8672129Z test_barrier (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support CPU barrier (0.002s) 2023-01-11T22:12:51.8672149Z 2023-01-11T22:12:51.8672408Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8672502Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8672522Z 2023-01-11T22:12:51.8672628Z OK (skipped=1) 2023-01-11T22:12:51.8672647Z 2023-01-11T22:12:51.8672768Z Generating XML reports... 2023-01-11T22:12:51.8673211Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215606.xml 2023-01-11T22:12:51.8673576Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8673751Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8674128Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8674318Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8674337Z 2023-01-11T22:12:51.8674445Z Running tests... 2023-01-11T22:12:51.8674691Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8674997Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8675246Z test_barrier_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8675467Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27412 2023-01-11T22:12:51.8675687Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27413 2023-01-11T22:12:51.8676052Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8676224Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8676604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8676778Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8677139Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8677313Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8677756Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8677943Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8678187Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8678430Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8678873Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8679284Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8679496Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8679725Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8680007Z [1673474173.915155] [7e0e28e30a97:27412:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8680237Z [1673474173.921518] [7e0e28e30a97:27412:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8680477Z [1673474173.921518] [7e0e28e30a97:27412:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8680750Z [1673474173.915909] [7e0e28e30a97:27413:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8680978Z [1673474173.922484] [7e0e28e30a97:27413:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8681215Z [1673474173.922484] [7e0e28e30a97:27413:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8681321Z ok (6.243s) 2023-01-11T22:12:51.8681341Z 2023-01-11T22:12:51.8681606Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8681702Z Ran 1 test in 6.243s 2023-01-11T22:12:51.8681721Z 2023-01-11T22:12:51.8681812Z OK 2023-01-11T22:12:51.8681831Z 2023-01-11T22:12:51.8681958Z Generating XML reports... 2023-01-11T22:12:51.8682405Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215609.xml 2023-01-11T22:12:51.8682773Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8682953Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8683328Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8683518Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8683540Z 2023-01-11T22:12:51.8683633Z Running tests... 2023-01-11T22:12:51.8683891Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8684195Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8684454Z test_barrier_full_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support CPU barrier (0.002s) 2023-01-11T22:12:51.8684474Z 2023-01-11T22:12:51.8684735Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8684845Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8684864Z 2023-01-11T22:12:51.8684971Z OK (skipped=1) 2023-01-11T22:12:51.8684990Z 2023-01-11T22:12:51.8685113Z Generating XML reports... 2023-01-11T22:12:51.8685552Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215618.xml 2023-01-11T22:12:51.8685900Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8686146Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8686532Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8686723Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8686744Z 2023-01-11T22:12:51.8686855Z Running tests... 2023-01-11T22:12:51.8687165Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8687494Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8687762Z test_barrier_full_group_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8687963Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27559 2023-01-11T22:12:51.8688182Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27560 2023-01-11T22:12:51.8688553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8688726Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8689099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8689289Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8689652Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8689823Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8690196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8690365Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8690610Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8690857Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8691253Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8691649Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8691875Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8692104Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8692265Z skip: Skipped due to small world size. (4.215s) 2023-01-11T22:12:51.8692285Z 2023-01-11T22:12:51.8692548Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8692643Z Ran 1 test in 4.215s 2023-01-11T22:12:51.8692666Z 2023-01-11T22:12:51.8692774Z OK (skipped=1) 2023-01-11T22:12:51.8692793Z 2023-01-11T22:12:51.8693101Z Generating XML reports... 2023-01-11T22:12:51.8693562Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215620.xml 2023-01-11T22:12:51.8693934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8694114Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8694489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8694681Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8694701Z 2023-01-11T22:12:51.8694808Z Running tests... 2023-01-11T22:12:51.8695057Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8695366Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8695715Z test_barrier_group (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support CPU barrier (0.002s) 2023-01-11T22:12:51.8695735Z 2023-01-11T22:12:51.8696004Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8696119Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8696138Z 2023-01-11T22:12:51.8696248Z OK (skipped=1) 2023-01-11T22:12:51.8696267Z 2023-01-11T22:12:51.8696452Z Generating XML reports... 2023-01-11T22:12:51.8696910Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215627.xml 2023-01-11T22:12:51.8697260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8697440Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8697818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8698014Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8698033Z 2023-01-11T22:12:51.8698142Z Running tests... 2023-01-11T22:12:51.8698401Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8698710Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8698974Z test_barrier_group_cuda (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8699192Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27695 2023-01-11T22:12:51.8699393Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27696 2023-01-11T22:12:51.8699765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8699942Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8700328Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8700517Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8700880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8701059Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8701438Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8701608Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8701855Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8702097Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8702503Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8702896Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8703127Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8703358Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8703518Z skip: Skipped due to small world size. (4.240s) 2023-01-11T22:12:51.8703538Z 2023-01-11T22:12:51.8703803Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8703899Z Ran 1 test in 4.240s 2023-01-11T22:12:51.8703919Z 2023-01-11T22:12:51.8704026Z OK (skipped=1) 2023-01-11T22:12:51.8704046Z 2023-01-11T22:12:51.8704170Z Generating XML reports... 2023-01-11T22:12:51.8704614Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215629.xml 2023-01-11T22:12:51.8705054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8705236Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8705615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8705857Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8705878Z 2023-01-11T22:12:51.8705996Z Running tests... 2023-01-11T22:12:51.8706248Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8706559Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8706839Z test_barrier_timeout_full_group (__main__.TestDistBackendWithSpawn) ... skip: Only gloo backend supports timeouts (0.002s) 2023-01-11T22:12:51.8706863Z 2023-01-11T22:12:51.8707119Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8707232Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8707251Z 2023-01-11T22:12:51.8707359Z OK (skipped=1) 2023-01-11T22:12:51.8707378Z 2023-01-11T22:12:51.8707501Z Generating XML reports... 2023-01-11T22:12:51.8707946Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215636.xml 2023-01-11T22:12:51.8708315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8708474Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8708852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8709044Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8709064Z 2023-01-11T22:12:51.8709174Z Running tests... 2023-01-11T22:12:51.8709439Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8709748Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8710023Z test_barrier_timeout_global (__main__.TestDistBackendWithSpawn) ... skip: Only gloo backend supports timeouts (0.002s) 2023-01-11T22:12:51.8710043Z 2023-01-11T22:12:51.8710304Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8710400Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8710438Z 2023-01-11T22:12:51.8710529Z OK (skipped=1) 2023-01-11T22:12:51.8710548Z 2023-01-11T22:12:51.8710671Z Generating XML reports... 2023-01-11T22:12:51.8711116Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215638.xml 2023-01-11T22:12:51.8711482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8711662Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8712036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8712225Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8712245Z 2023-01-11T22:12:51.8712353Z Running tests... 2023-01-11T22:12:51.8712601Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8712907Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8713179Z test_barrier_timeout_group (__main__.TestDistBackendWithSpawn) ... skip: Only gloo backend supports timeouts (0.002s) 2023-01-11T22:12:51.8713198Z 2023-01-11T22:12:51.8713458Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8713569Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8713645Z 2023-01-11T22:12:51.8713762Z OK (skipped=1) 2023-01-11T22:12:51.8713781Z 2023-01-11T22:12:51.8713906Z Generating XML reports... 2023-01-11T22:12:51.8714356Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215641.xml 2023-01-11T22:12:51.8714723Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8714943Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8715339Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8715529Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8715549Z 2023-01-11T22:12:51.8715658Z Running tests... 2023-01-11T22:12:51.8715922Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8716229Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8716487Z test_batch_isend_irecv_gloo (__main__.TestDistBackendWithSpawn) ... skip: GLOO Batch Send Recv CPU (0.002s) 2023-01-11T22:12:51.8716507Z 2023-01-11T22:12:51.8716760Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8716856Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8716892Z 2023-01-11T22:12:51.8716983Z OK (skipped=1) 2023-01-11T22:12:51.8717002Z 2023-01-11T22:12:51.8717128Z Generating XML reports... 2023-01-11T22:12:51.8717572Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215643.xml 2023-01-11T22:12:51.8717937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8718113Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8718487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8718680Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8718700Z 2023-01-11T22:12:51.8718808Z Running tests... 2023-01-11T22:12:51.8719053Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8719361Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8719626Z test_batch_isend_irecv_gloo_tags (__main__.TestDistBackendWithSpawn) ... skip: GLOO Batch Send Recv CPU (0.002s) 2023-01-11T22:12:51.8719646Z 2023-01-11T22:12:51.8719903Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8720015Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8720034Z 2023-01-11T22:12:51.8720141Z OK (skipped=1) 2023-01-11T22:12:51.8720160Z 2023-01-11T22:12:51.8720282Z Generating XML reports... 2023-01-11T22:12:51.8720720Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215645.xml 2023-01-11T22:12:51.8721088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8721246Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8721616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8721809Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8721829Z 2023-01-11T22:12:51.8721936Z Running tests... 2023-01-11T22:12:51.8722194Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8722500Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8722773Z test_batch_isend_irecv_mixed_backend_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:12:51.8722792Z 2023-01-11T22:12:51.8723121Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8723234Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8723253Z 2023-01-11T22:12:51.8723343Z OK (skipped=1) 2023-01-11T22:12:51.8723362Z 2023-01-11T22:12:51.8723486Z Generating XML reports... 2023-01-11T22:12:51.8723931Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215648.xml 2023-01-11T22:12:51.8724345Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8724529Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8724912Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8725103Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8725122Z 2023-01-11T22:12:51.8725232Z Running tests... 2023-01-11T22:12:51.8725482Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8725793Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8726047Z test_batch_isend_irecv_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.003s) 2023-01-11T22:12:51.8726067Z 2023-01-11T22:12:51.8726325Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8726441Z Ran 1 test in 0.003s 2023-01-11T22:12:51.8726460Z 2023-01-11T22:12:51.8726567Z OK (skipped=1) 2023-01-11T22:12:51.8726586Z 2023-01-11T22:12:51.8726709Z Generating XML reports... 2023-01-11T22:12:51.8727151Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215650.xml 2023-01-11T22:12:51.8727520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8727683Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8728059Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8728249Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8728269Z 2023-01-11T22:12:51.8728377Z Running tests... 2023-01-11T22:12:51.8728639Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8728952Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8729223Z test_batch_isend_irecv_no_rank_zero_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:12:51.8729242Z 2023-01-11T22:12:51.8729498Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8729609Z Ran 1 test in 0.003s 2023-01-11T22:12:51.8729629Z 2023-01-11T22:12:51.8729742Z OK (skipped=1) 2023-01-11T22:12:51.8729766Z 2023-01-11T22:12:51.8729893Z Generating XML reports... 2023-01-11T22:12:51.8730336Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215652.xml 2023-01-11T22:12:51.8730704Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8730877Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8731257Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8731451Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8731470Z 2023-01-11T22:12:51.8731581Z Running tests... 2023-01-11T22:12:51.8731825Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8732136Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8732459Z test_batch_isend_irecv_op_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:12:51.8732479Z 2023-01-11T22:12:51.8732742Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8733019Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8733054Z 2023-01-11T22:12:51.8733254Z OK (skipped=1) 2023-01-11T22:12:51.8733283Z 2023-01-11T22:12:51.8733487Z Generating XML reports... 2023-01-11T22:12:51.8734031Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215655.xml 2023-01-11T22:12:51.8734420Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8734579Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8734954Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8735149Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8735169Z 2023-01-11T22:12:51.8735276Z Running tests... 2023-01-11T22:12:51.8735537Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8735844Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8736109Z test_batch_isend_irecv_op_list_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:12:51.8736128Z 2023-01-11T22:12:51.8736386Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8736498Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8736517Z 2023-01-11T22:12:51.8736606Z OK (skipped=1) 2023-01-11T22:12:51.8736624Z 2023-01-11T22:12:51.8736746Z Generating XML reports... 2023-01-11T22:12:51.8737187Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215657.xml 2023-01-11T22:12:51.8737557Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8737730Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8738107Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8738295Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8738317Z 2023-01-11T22:12:51.8738425Z Running tests... 2023-01-11T22:12:51.8738683Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8738975Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8739251Z test_batch_isend_irecv_ring_exchange_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:12:51.8739270Z 2023-01-11T22:12:51.8739524Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8739637Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8739656Z 2023-01-11T22:12:51.8739761Z OK (skipped=1) 2023-01-11T22:12:51.8739780Z 2023-01-11T22:12:51.8739902Z Generating XML reports... 2023-01-11T22:12:51.8740342Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215700.xml 2023-01-11T22:12:51.8740712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8740871Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8741253Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8741441Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8741461Z 2023-01-11T22:12:51.8741568Z Running tests... 2023-01-11T22:12:51.8741828Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8742265Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8742525Z test_batch_isend_irecv_self_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:12:51.8742545Z 2023-01-11T22:12:51.8742805Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8742917Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8742981Z 2023-01-11T22:12:51.8743078Z OK (skipped=1) 2023-01-11T22:12:51.8743097Z 2023-01-11T22:12:51.8743222Z Generating XML reports... 2023-01-11T22:12:51.8743663Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215702.xml 2023-01-11T22:12:51.8744031Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8744208Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8744587Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8744776Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8744796Z 2023-01-11T22:12:51.8744904Z Running tests... 2023-01-11T22:12:51.8745167Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8745459Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8745721Z test_batch_isend_irecv_tensor_err (__main__.TestDistBackendWithSpawn) ... skip: NCCL Batch Send Recv Only (0.002s) 2023-01-11T22:12:51.8745740Z 2023-01-11T22:12:51.8745998Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8746107Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8746126Z 2023-01-11T22:12:51.8746231Z OK (skipped=1) 2023-01-11T22:12:51.8746250Z 2023-01-11T22:12:51.8746375Z Generating XML reports... 2023-01-11T22:12:51.8746810Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215704.xml 2023-01-11T22:12:51.8747177Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8747350Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8747710Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8747903Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8747922Z 2023-01-11T22:12:51.8748029Z Running tests... 2023-01-11T22:12:51.8748288Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8748594Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8748838Z test_broadcast (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8749062Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28227 2023-01-11T22:12:51.8749275Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28228 2023-01-11T22:12:51.8749623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8749801Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8750174Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8750362Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8750719Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8750889Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8751341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8751533Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8751761Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8752008Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8752452Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8752858Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8753089Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8753425Z STAGE:2023-01-11 21:57:11 28228:28228 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8753651Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8753974Z STAGE:2023-01-11 21:57:11 28227:28227 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8754253Z [1673474231.191294] [7e0e28e30a97:28228:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8754487Z [1673474232.215108] [7e0e28e30a97:28228:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8754711Z [1673474232.215108] [7e0e28e30a97:28228:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8754981Z [1673474231.169872] [7e0e28e30a97:28227:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8755209Z [1673474232.205953] [7e0e28e30a97:28227:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8755448Z [1673474232.205953] [7e0e28e30a97:28227:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8755999Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8756026Z 2023-01-11T22:12:51.8756594Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8756615Z 2023-01-11T22:12:51.8756940Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8757266Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8757600Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8757924Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8758268Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8758599Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8758925Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8759241Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8759568Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8759892Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8760339Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8760682Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8761010Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8761378Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8761705Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8762033Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8762377Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8762724Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8763050Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8763368Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8763698Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8764030Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8764369Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8764694Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8765017Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8765338Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8765669Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8765994Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8766338Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8766679Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8767004Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8767324Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8767638Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8767964Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8768307Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8768646Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8768974Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8769293Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8769621Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8769943Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8770282Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8770679Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8771002Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8771326Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8771706Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8772044Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8772387Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8772724Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8773327Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8773667Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8773982Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8774314Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8774660Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8775000Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8775323Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8775641Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8775975Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8776299Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8776635Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8776963Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8777285Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8777609Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8777940Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8778265Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8778612Z STAGE:2023-01-11 21:57:12 28227:28227 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8778953Z STAGE:2023-01-11 21:57:12 28228:28228 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8779057Z ok (5.951s) 2023-01-11T22:12:51.8779077Z 2023-01-11T22:12:51.8779343Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8779440Z Ran 1 test in 5.952s 2023-01-11T22:12:51.8779462Z 2023-01-11T22:12:51.8779555Z OK 2023-01-11T22:12:51.8779575Z 2023-01-11T22:12:51.8779699Z Generating XML reports... 2023-01-11T22:12:51.8780149Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215707.xml 2023-01-11T22:12:51.8780519Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8780701Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8781193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8781387Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8781407Z 2023-01-11T22:12:51.8781499Z Running tests... 2023-01-11T22:12:51.8781763Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8782151Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8782446Z test_broadcast_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo and Nccl backend supports CUDA allReduce (0.002s) 2023-01-11T22:12:51.8782466Z 2023-01-11T22:12:51.8782731Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8782850Z Ran 1 test in 0.002s 2023-01-11T22:12:51.8782869Z 2023-01-11T22:12:51.8782977Z OK (skipped=1) 2023-01-11T22:12:51.8782996Z 2023-01-11T22:12:51.8783127Z Generating XML reports... 2023-01-11T22:12:51.8783574Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215715.xml 2023-01-11T22:12:51.8783924Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8784102Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8784482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8784674Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8784693Z 2023-01-11T22:12:51.8784801Z Running tests... 2023-01-11T22:12:51.8785062Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8785371Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8785633Z test_broadcast_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8785841Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28374 2023-01-11T22:12:51.8786059Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28375 2023-01-11T22:12:51.8786428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8786605Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8786986Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8787184Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8787548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8787721Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8788097Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8788270Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8788515Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8788759Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8789155Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8789550Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8789779Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8790017Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.8790306Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8790544Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.8790930Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8791367Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.8791720Z STAGE:2023-01-11 21:57:21 28375:28375 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8792044Z STAGE:2023-01-11 21:57:21 28374:28374 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8792325Z [1673474242.016033] [7e0e28e30a97:28375:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8792563Z [1673474243.062155] [7e0e28e30a97:28375:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8792808Z [1673474243.062155] [7e0e28e30a97:28375:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8793081Z [1673474241.994290] [7e0e28e30a97:28374:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8793300Z [1673474243.035823] [7e0e28e30a97:28374:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8793541Z [1673474243.035823] [7e0e28e30a97:28374:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8794094Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8794119Z 2023-01-11T22:12:51.8794468Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8794815Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8795145Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8795471Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8795804Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8796114Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8796458Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8796807Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8797132Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8797451Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8797786Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8798105Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8798451Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8798786Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8799094Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8799484Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8799817Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8800152Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8800536Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8800890Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8801215Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8801536Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8801866Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8802179Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8802519Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8802862Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8803191Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8803511Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8803841Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8804163Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8804505Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8804849Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8805156Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8805474Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8805811Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8806139Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8806482Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8806820Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8807152Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8807471Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8807784Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8808112Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8808457Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8808799Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8809123Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8809443Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8809844Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8810177Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8810517Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8810884Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8811223Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8811542Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8811875Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8812204Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8812549Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8813070Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8813410Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8813738Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8814053Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8814377Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8814721Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8815059Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8815388Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8815710Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.8816039Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8816368Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.8816712Z STAGE:2023-01-11 21:57:23 28374:28374 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8817033Z STAGE:2023-01-11 21:57:23 28375:28375 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.8817138Z ok (5.832s) 2023-01-11T22:12:51.8817157Z 2023-01-11T22:12:51.8817422Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8817544Z Ran 1 test in 5.833s 2023-01-11T22:12:51.8817563Z 2023-01-11T22:12:51.8817657Z OK 2023-01-11T22:12:51.8817676Z 2023-01-11T22:12:51.8817802Z Generating XML reports... 2023-01-11T22:12:51.8818251Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215718.xml 2023-01-11T22:12:51.8818627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8818805Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8819166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8819360Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8819380Z 2023-01-11T22:12:51.8819490Z Running tests... 2023-01-11T22:12:51.8819753Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8820160Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8820417Z test_broadcast_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8820638Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28488 2023-01-11T22:12:51.8820857Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28489 2023-01-11T22:12:51.8821271Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8821458Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8821845Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8822039Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8822403Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8822580Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8822961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8823151Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8823399Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8823625Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8824025Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8824418Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8824647Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8824878Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8825039Z skip: Skipped due to small world size. (4.214s) 2023-01-11T22:12:51.8825059Z 2023-01-11T22:12:51.8825328Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8825439Z Ran 1 test in 4.214s 2023-01-11T22:12:51.8825459Z 2023-01-11T22:12:51.8825568Z OK (skipped=1) 2023-01-11T22:12:51.8825589Z 2023-01-11T22:12:51.8825696Z Generating XML reports... 2023-01-11T22:12:51.8826140Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215726.xml 2023-01-11T22:12:51.8826513Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8826691Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8827074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8827267Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8827286Z 2023-01-11T22:12:51.8827396Z Running tests... 2023-01-11T22:12:51.8827659Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8827955Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8828217Z test_broadcast_multigpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8828434Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28591 2023-01-11T22:12:51.8828652Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28592 2023-01-11T22:12:51.8829020Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8829261Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8829647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8829839Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8830202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8830433Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8830823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8831011Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8831256Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8831499Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8831907Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8832302Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8832532Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8832747Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8833521Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1478: UserWarning: torch.distributed.broadcast_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T22:12:51.8833639Z warnings.warn( 2023-01-11T22:12:51.8834404Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1478: UserWarning: torch.distributed.broadcast_multigpu will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#multi-gpu-collective-functions 2023-01-11T22:12:51.8834518Z warnings.warn( 2023-01-11T22:12:51.8834794Z [1673474257.916799] [7e0e28e30a97:28591:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8835031Z [1673474257.924065] [7e0e28e30a97:28591:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8835274Z [1673474257.924065] [7e0e28e30a97:28591:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8835549Z [1673474257.925384] [7e0e28e30a97:28592:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8835785Z [1673474257.930274] [7e0e28e30a97:28592:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8836021Z [1673474257.930274] [7e0e28e30a97:28592:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8836107Z ok (5.533s) 2023-01-11T22:12:51.8836146Z 2023-01-11T22:12:51.8836401Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8836518Z Ran 1 test in 5.533s 2023-01-11T22:12:51.8836539Z 2023-01-11T22:12:51.8836633Z OK 2023-01-11T22:12:51.8836653Z 2023-01-11T22:12:51.8836780Z Generating XML reports... 2023-01-11T22:12:51.8837231Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215733.xml 2023-01-11T22:12:51.8837598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8837837Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8838219Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8838394Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8838413Z 2023-01-11T22:12:51.8838523Z Running tests... 2023-01-11T22:12:51.8838787Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8839142Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8839418Z test_broadcast_object_list (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8840170Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82847 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.616s) 2023-01-11T22:12:51.8840194Z 2023-01-11T22:12:51.8840460Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8840572Z Ran 1 test in 1.616s 2023-01-11T22:12:51.8840592Z 2023-01-11T22:12:51.8840701Z OK (skipped=1) 2023-01-11T22:12:51.8840721Z 2023-01-11T22:12:51.8840828Z Generating XML reports... 2023-01-11T22:12:51.8841275Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215741.xml 2023-01-11T22:12:51.8841648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8841828Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8842213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8842404Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8842426Z 2023-01-11T22:12:51.8842538Z Running tests... 2023-01-11T22:12:51.8842802Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8843111Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8843407Z test_compute_bucket_assignment_by_size_sparse_error_with_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8844155Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/85012 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.622s) 2023-01-11T22:12:51.8844176Z 2023-01-11T22:12:51.8844441Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8844556Z Ran 1 test in 1.622s 2023-01-11T22:12:51.8844580Z 2023-01-11T22:12:51.8844687Z OK (skipped=1) 2023-01-11T22:12:51.8844706Z 2023-01-11T22:12:51.8844830Z Generating XML reports... 2023-01-11T22:12:51.8845275Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215745.xml 2023-01-11T22:12:51.8845643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8845824Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8846203Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8846379Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8846398Z 2023-01-11T22:12:51.8846506Z Running tests... 2023-01-11T22:12:51.8846771Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8847083Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8847461Z test_compute_bucket_assignment_by_size_sparse_error_without_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8848266Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/85339 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.660s) 2023-01-11T22:12:51.8848289Z 2023-01-11T22:12:51.8848558Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8848671Z Ran 1 test in 1.660s 2023-01-11T22:12:51.8848691Z 2023-01-11T22:12:51.8848799Z OK (skipped=1) 2023-01-11T22:12:51.8848819Z 2023-01-11T22:12:51.8848941Z Generating XML reports... 2023-01-11T22:12:51.8849364Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215749.xml 2023-01-11T22:12:51.8849739Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8849918Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8850300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8850496Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8850515Z 2023-01-11T22:12:51.8850626Z Running tests... 2023-01-11T22:12:51.8850894Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8851207Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8851464Z test_ddp_apply_optim_in_backward (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8851685Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28807 2023-01-11T22:12:51.8851907Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28808 2023-01-11T22:12:51.8852278Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8852454Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8852836Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8853309Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8853771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8853947Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8854302Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8854497Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8854743Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8854987Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8855387Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8855785Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8856012Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8856236Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8857010Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T22:12:51.8857203Z warnings.warn( 2023-01-11T22:12:51.8858038Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T22:12:51.8858156Z warnings.warn( 2023-01-11T22:12:51.8858437Z [1673474278.402935] [7e0e28e30a97:28808:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8858673Z [1673474278.408362] [7e0e28e30a97:28808:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8858917Z [1673474278.408362] [7e0e28e30a97:28808:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8859187Z [1673474278.393877] [7e0e28e30a97:28807:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8859424Z [1673474278.399169] [7e0e28e30a97:28807:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8859662Z [1673474278.399169] [7e0e28e30a97:28807:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8859900Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8860119Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8860352Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8860589Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8860817Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8861046Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8861274Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8861507Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8861610Z ok (7.446s) 2023-01-11T22:12:51.8861631Z 2023-01-11T22:12:51.8861907Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8862003Z Ran 1 test in 7.446s 2023-01-11T22:12:51.8862022Z 2023-01-11T22:12:51.8862116Z OK 2023-01-11T22:12:51.8862135Z 2023-01-11T22:12:51.8862261Z Generating XML reports... 2023-01-11T22:12:51.8862712Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215753.xml 2023-01-11T22:12:51.8863090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8863268Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8863647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8863844Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8863863Z 2023-01-11T22:12:51.8863956Z Running tests... 2023-01-11T22:12:51.8864224Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8864535Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8864843Z test_ddp_apply_optim_in_backward_grad_as_bucket_view_false (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8865129Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28925 2023-01-11T22:12:51.8865347Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28926 2023-01-11T22:12:51.8865721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8865896Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8866322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8866502Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8866871Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8867045Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8867422Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8867615Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8867861Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8868105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8868510Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8868907Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8869119Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8869348Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8870126Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T22:12:51.8870241Z warnings.warn( 2023-01-11T22:12:51.8871001Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T22:12:51.8871111Z warnings.warn( 2023-01-11T22:12:51.8871390Z [1673474288.418812] [7e0e28e30a97:28925:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8871626Z [1673474288.426152] [7e0e28e30a97:28925:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8871869Z [1673474288.426152] [7e0e28e30a97:28925:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8872143Z [1673474288.424251] [7e0e28e30a97:28926:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8872377Z [1673474288.429972] [7e0e28e30a97:28926:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8872597Z [1673474288.429972] [7e0e28e30a97:28926:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8872835Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8873065Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8873362Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8873595Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8873700Z ok (6.638s) 2023-01-11T22:12:51.8873719Z 2023-01-11T22:12:51.8873994Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8874107Z Ran 1 test in 6.638s 2023-01-11T22:12:51.8874127Z 2023-01-11T22:12:51.8874203Z OK 2023-01-11T22:12:51.8874268Z 2023-01-11T22:12:51.8874400Z Generating XML reports... 2023-01-11T22:12:51.8874855Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215803.xml 2023-01-11T22:12:51.8875226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8875405Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8875783Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8875979Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8875999Z 2023-01-11T22:12:51.8876110Z Running tests... 2023-01-11T22:12:51.8876373Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8876667Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8876967Z test_ddp_apply_optim_in_backward_ignored_params (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8877188Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29043 2023-01-11T22:12:51.8877406Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29044 2023-01-11T22:12:51.8877774Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8877955Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8878333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8878523Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8878874Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8879052Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8879427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8879614Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8879860Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8880104Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8880510Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8880906Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8881138Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8881350Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8882123Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T22:12:51.8882238Z warnings.warn( 2023-01-11T22:12:51.8883074Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:738: UserWarning: DDP + apply_optim_in_backward will currently set all parameter gradients to None. If this is not the desired behavior, please set env variable DDP_OVERLAPPED_OPTIM_SET_GRADS_TO_NONE=0, and manually setgradients to None/zero as desired. 2023-01-11T22:12:51.8883186Z warnings.warn( 2023-01-11T22:12:51.8883537Z [1673474297.566290] [7e0e28e30a97:29043:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8883781Z [1673474297.571684] [7e0e28e30a97:29043:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8884021Z [1673474297.571684] [7e0e28e30a97:29043:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8884296Z [1673474297.568202] [7e0e28e30a97:29044:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8884531Z [1673474297.575302] [7e0e28e30a97:29044:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8884768Z [1673474297.575302] [7e0e28e30a97:29044:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8884854Z ok (6.628s) 2023-01-11T22:12:51.8884874Z 2023-01-11T22:12:51.8885152Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8885268Z Ran 1 test in 6.628s 2023-01-11T22:12:51.8885287Z 2023-01-11T22:12:51.8885381Z OK 2023-01-11T22:12:51.8885399Z 2023-01-11T22:12:51.8885525Z Generating XML reports... 2023-01-11T22:12:51.8885974Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215812.xml 2023-01-11T22:12:51.8886343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8886524Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8886900Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8887073Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8887092Z 2023-01-11T22:12:51.8887203Z Running tests... 2023-01-11T22:12:51.8887468Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8887779Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8888043Z test_ddp_broadcast_buffer (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8888265Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29163 2023-01-11T22:12:51.8888482Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29164 2023-01-11T22:12:51.8888858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8889016Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8889395Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8889586Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8889955Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8890131Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8890507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8890699Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8890945Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8891248Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8891635Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8892077Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8892315Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8892545Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8892822Z [1673474306.772442] [7e0e28e30a97:29163:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8893240Z [1673474306.778366] [7e0e28e30a97:29163:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8893492Z [1673474306.778366] [7e0e28e30a97:29163:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8893766Z [1673474306.775900] [7e0e28e30a97:29164:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8894001Z [1673474306.781448] [7e0e28e30a97:29164:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8894220Z [1673474306.781448] [7e0e28e30a97:29164:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8894457Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8894694Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8894800Z ok (5.918s) 2023-01-11T22:12:51.8894823Z 2023-01-11T22:12:51.8895105Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8895223Z Ran 1 test in 5.918s 2023-01-11T22:12:51.8895243Z 2023-01-11T22:12:51.8895338Z OK 2023-01-11T22:12:51.8895357Z 2023-01-11T22:12:51.8895482Z Generating XML reports... 2023-01-11T22:12:51.8895932Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215822.xml 2023-01-11T22:12:51.8896292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8896472Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8896857Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8897050Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8897070Z 2023-01-11T22:12:51.8897186Z Running tests... 2023-01-11T22:12:51.8897450Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8897761Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8898038Z test_ddp_broadcast_buffer_via_hook (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8898240Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29281 2023-01-11T22:12:51.8898463Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29282 2023-01-11T22:12:51.8898837Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8899013Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8899392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8899672Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8900050Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8900226Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8900603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8900831Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8901089Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8901334Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8901744Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8902139Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8902373Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8902603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8902882Z [1673474315.217441] [7e0e28e30a97:29282:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8903119Z [1673474315.222926] [7e0e28e30a97:29282:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8903341Z [1673474315.222926] [7e0e28e30a97:29282:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8903617Z [1673474315.211375] [7e0e28e30a97:29281:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8903851Z [1673474315.218448] [7e0e28e30a97:29281:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8904090Z [1673474315.218448] [7e0e28e30a97:29281:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8904325Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8904566Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8904801Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8905035Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8905139Z ok (5.946s) 2023-01-11T22:12:51.8905159Z 2023-01-11T22:12:51.8905415Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8905528Z Ran 1 test in 5.946s 2023-01-11T22:12:51.8905551Z 2023-01-11T22:12:51.8905647Z OK 2023-01-11T22:12:51.8905666Z 2023-01-11T22:12:51.8905793Z Generating XML reports... 2023-01-11T22:12:51.8906245Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215830.xml 2023-01-11T22:12:51.8906617Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8906798Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8907180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8907372Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8907391Z 2023-01-11T22:12:51.8907484Z Running tests... 2023-01-11T22:12:51.8907750Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8908064Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8908402Z test_ddp_buffer_hook_allreduce (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8909213Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78641 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.619s) 2023-01-11T22:12:51.8909235Z 2023-01-11T22:12:51.8909509Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8909625Z Ran 1 test in 1.619s 2023-01-11T22:12:51.8909644Z 2023-01-11T22:12:51.8909754Z OK (skipped=1) 2023-01-11T22:12:51.8909773Z 2023-01-11T22:12:51.8909899Z Generating XML reports... 2023-01-11T22:12:51.8910326Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215839.xml 2023-01-11T22:12:51.8910700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8910879Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8911260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8911453Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8911476Z 2023-01-11T22:12:51.8911586Z Running tests... 2023-01-11T22:12:51.8911848Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8912158Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8912446Z test_ddp_buffer_hook_allreduce_return_future (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8913184Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77261 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.589s) 2023-01-11T22:12:51.8913209Z 2023-01-11T22:12:51.8913454Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8913566Z Ran 1 test in 1.590s 2023-01-11T22:12:51.8913586Z 2023-01-11T22:12:51.8913695Z OK (skipped=1) 2023-01-11T22:12:51.8913718Z 2023-01-11T22:12:51.8913842Z Generating XML reports... 2023-01-11T22:12:51.8914290Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215843.xml 2023-01-11T22:12:51.8914660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8914836Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8915215Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8915411Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8915430Z 2023-01-11T22:12:51.8915521Z Running tests... 2023-01-11T22:12:51.8915785Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8916099Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8916393Z test_ddp_build_debug_param_to_name_mapping (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8916614Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29467 2023-01-11T22:12:51.8916831Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29468 2023-01-11T22:12:51.8917203Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8917452Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8917819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8918015Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8918379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8918598Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8918989Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8919180Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8919428Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8919671Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8920079Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8920454Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8920761Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8921033Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8921348Z [1673474331.893353] [7e0e28e30a97:29468:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8921640Z [1673474331.898178] [7e0e28e30a97:29468:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8921920Z [1673474331.898178] [7e0e28e30a97:29468:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8922132Z 2023-01-11T22:12:51.8922556Z [1673474331.884333] [7e0e28e30a97:29467:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8922835Z [1673474331.891018] [7e0e28e30a97:29467:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8923113Z [1673474331.891018] [7e0e28e30a97:29467:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8946327Z ok (5.421s) 2023-01-11T22:12:51.8946362Z 2023-01-11T22:12:51.8946708Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8946823Z Ran 1 test in 5.421s 2023-01-11T22:12:51.8946844Z 2023-01-11T22:12:51.8946920Z OK 2023-01-11T22:12:51.8946946Z 2023-01-11T22:12:51.8947063Z Generating XML reports... 2023-01-11T22:12:51.8947529Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215847.xml 2023-01-11T22:12:51.8947901Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8948071Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8948448Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8948632Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8948653Z 2023-01-11T22:12:51.8948752Z Running tests... 2023-01-11T22:12:51.8949023Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8949322Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8949632Z test_ddp_build_debug_param_to_name_mapping_requires_grad (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8950029Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29581 2023-01-11T22:12:51.8950246Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29582 2023-01-11T22:12:51.8950630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8950880Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8951279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8951472Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8951842Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8952000Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8952391Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8952587Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8952836Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8953085Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8953488Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8953889Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8954120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8954329Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8954616Z [1673474339.828498] [7e0e28e30a97:29581:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8954854Z [1673474339.835452] [7e0e28e30a97:29581:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8955098Z [1673474339.835452] [7e0e28e30a97:29581:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8955372Z [1673474339.833009] [7e0e28e30a97:29582:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8955603Z [1673474339.840091] [7e0e28e30a97:29582:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8955840Z [1673474339.840091] [7e0e28e30a97:29582:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8955948Z ok (5.436s) 2023-01-11T22:12:51.8955969Z 2023-01-11T22:12:51.8956246Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8956363Z Ran 1 test in 5.436s 2023-01-11T22:12:51.8956383Z 2023-01-11T22:12:51.8956459Z OK 2023-01-11T22:12:51.8956478Z 2023-01-11T22:12:51.8956605Z Generating XML reports... 2023-01-11T22:12:51.8957061Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215855.xml 2023-01-11T22:12:51.8957434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8957613Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8957996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8958192Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8958273Z 2023-01-11T22:12:51.8958391Z Running tests... 2023-01-11T22:12:51.8958662Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8958955Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8959221Z test_ddp_comm_hook_logging (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8959488Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29695 2023-01-11T22:12:51.8959712Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29696 2023-01-11T22:12:51.8960091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8960271Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8960650Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8960847Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8961198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8961376Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8961758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8961949Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8962197Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8962444Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8962849Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8963252Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8963482Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8963691Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8963934Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8964171Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8964450Z [1673474347.901610] [7e0e28e30a97:29695:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8964686Z [1673474347.908121] [7e0e28e30a97:29695:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8964932Z [1673474347.908121] [7e0e28e30a97:29695:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8965205Z [1673474347.908346] [7e0e28e30a97:29696:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8965438Z [1673474347.914589] [7e0e28e30a97:29696:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8965679Z [1673474347.914589] [7e0e28e30a97:29696:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8965767Z ok (6.044s) 2023-01-11T22:12:51.8965806Z 2023-01-11T22:12:51.8966060Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8966175Z Ran 1 test in 6.045s 2023-01-11T22:12:51.8966195Z 2023-01-11T22:12:51.8966287Z OK 2023-01-11T22:12:51.8966307Z 2023-01-11T22:12:51.8966432Z Generating XML reports... 2023-01-11T22:12:51.8966951Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215903.xml 2023-01-11T22:12:51.8967321Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8967500Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8967930Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8968112Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8968150Z 2023-01-11T22:12:51.8968244Z Running tests... 2023-01-11T22:12:51.8968518Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8968833Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8969124Z test_ddp_control_flow_different_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8969351Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29813 2023-01-11T22:12:51.8969570Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29814 2023-01-11T22:12:51.8969944Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8970122Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8970486Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8970680Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8971051Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8971225Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8971610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8971799Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8972047Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8972292Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8972677Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8973253Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8973494Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8973726Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8974011Z [1673474356.460508] [7e0e28e30a97:29814:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8974246Z [1673474356.465461] [7e0e28e30a97:29814:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8974489Z [1673474356.465461] [7e0e28e30a97:29814:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8974764Z [1673474356.454871] [7e0e28e30a97:29813:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8974994Z [1673474356.460773] [7e0e28e30a97:29813:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8975233Z [1673474356.460773] [7e0e28e30a97:29813:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8976198Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:12:51.8976313Z ok (6.032s) 2023-01-11T22:12:51.8976334Z 2023-01-11T22:12:51.8976603Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8976718Z Ran 1 test in 6.032s 2023-01-11T22:12:51.8976738Z 2023-01-11T22:12:51.8976832Z OK 2023-01-11T22:12:51.8976851Z 2023-01-11T22:12:51.8976980Z Generating XML reports... 2023-01-11T22:12:51.8977433Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215911.xml 2023-01-11T22:12:51.8977809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8977988Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8978372Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8978551Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8978590Z 2023-01-11T22:12:51.8978683Z Running tests... 2023-01-11T22:12:51.8978953Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8979269Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8979549Z test_ddp_control_flow_same_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8980299Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78235 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.631s) 2023-01-11T22:12:51.8980323Z 2023-01-11T22:12:51.8980587Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8980703Z Ran 1 test in 1.631s 2023-01-11T22:12:51.8980726Z 2023-01-11T22:12:51.8980839Z OK (skipped=1) 2023-01-11T22:12:51.8980858Z 2023-01-11T22:12:51.8980984Z Generating XML reports... 2023-01-11T22:12:51.8981413Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215920.xml 2023-01-11T22:12:51.8981786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8981965Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8982348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8982541Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8982560Z 2023-01-11T22:12:51.8982670Z Running tests... 2023-01-11T22:12:51.8982932Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.8983246Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.8983508Z test_ddp_create_graph (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.8983710Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29965 2023-01-11T22:12:51.8983929Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29966 2023-01-11T22:12:51.8984300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8984541Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8984926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8985118Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8985548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.8985733Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.8986099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.8986290Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.8986538Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.8986784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.8987190Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8987586Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.8987821Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.8988053Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.8988332Z [1673474368.446780] [7e0e28e30a97:29966:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8988548Z [1673474369.217095] [7e0e28e30a97:29966:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8988789Z [1673474369.217095] [7e0e28e30a97:29966:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8989066Z [1673474368.426410] [7e0e28e30a97:29965:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.8989297Z [1673474369.222440] [7e0e28e30a97:29965:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.8989538Z [1673474369.222440] [7e0e28e30a97:29965:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.8990437Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:12:51.8991321Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:12:51.8992481Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/engine.cpp:1134.) 2023-01-11T22:12:51.8992718Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T22:12:51.8993985Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/engine.cpp:1134.) 2023-01-11T22:12:51.8994223Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T22:12:51.8994463Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8994703Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.8995589Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:12:51.8996466Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:12:51.8997329Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:12:51.8998197Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:12:51.8999058Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:12:51.8999915Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:12:51.9000777Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:12:51.9001631Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:12:51.9002551Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:12:51.9003456Z [W reducer.cpp:380] Using DistributedDataParallel with create_graph=True is not well-supported. The higher-order gradient will not be synchronized across ranks, and backpropagation through all_reduce operations will not occur. If you require DDP to work with higher-order gradients for your use case, please ping https://github.com/pytorch/pytorch/issues/63929 2023-01-11T22:12:51.9003568Z ok (5.433s) 2023-01-11T22:12:51.9003589Z 2023-01-11T22:12:51.9003861Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9003982Z Ran 1 test in 5.433s 2023-01-11T22:12:51.9004002Z 2023-01-11T22:12:51.9004098Z OK 2023-01-11T22:12:51.9004117Z 2023-01-11T22:12:51.9004228Z Generating XML reports... 2023-01-11T22:12:51.9004682Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215924.xml 2023-01-11T22:12:51.9005059Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9005240Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9005621Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9005814Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9005834Z 2023-01-11T22:12:51.9005945Z Running tests... 2023-01-11T22:12:51.9006211Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9006512Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9006763Z test_ddp_device (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9007507Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77324 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.614s) 2023-01-11T22:12:51.9007528Z 2023-01-11T22:12:51.9007790Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9007906Z Ran 1 test in 1.615s 2023-01-11T22:12:51.9007926Z 2023-01-11T22:12:51.9008036Z OK (skipped=1) 2023-01-11T22:12:51.9008055Z 2023-01-11T22:12:51.9008183Z Generating XML reports... 2023-01-11T22:12:51.9008628Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215932.xml 2023-01-11T22:12:51.9009006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9009184Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9009542Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9009740Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9009760Z 2023-01-11T22:12:51.9009871Z Running tests... 2023-01-11T22:12:51.9010135Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9010447Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9010719Z test_ddp_forward_backward_hook (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9010941Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30113 2023-01-11T22:12:51.9011224Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30114 2023-01-11T22:12:51.9011600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9011760Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9012188Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9012387Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9012762Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9013350Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9013808Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9014006Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9014253Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9014481Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9014883Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9015278Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9015509Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9015738Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9016530Z /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1331: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior. 2023-01-11T22:12:51.9016864Z warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes " 2023-01-11T22:12:51.9017646Z /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1331: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior. 2023-01-11T22:12:51.9017976Z warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes " 2023-01-11T22:12:51.9018257Z [1673474381.368090] [7e0e28e30a97:30114:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9018496Z [1673474381.373353] [7e0e28e30a97:30114:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9018735Z [1673474381.373353] [7e0e28e30a97:30114:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9018992Z [1673474381.366827] [7e0e28e30a97:30113:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9019225Z [1673474381.373574] [7e0e28e30a97:30113:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9019463Z [1673474381.373574] [7e0e28e30a97:30113:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9019568Z ok (5.948s) 2023-01-11T22:12:51.9019589Z 2023-01-11T22:12:51.9019860Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9020081Z Ran 1 test in 5.948s 2023-01-11T22:12:51.9020102Z 2023-01-11T22:12:51.9020196Z OK 2023-01-11T22:12:51.9020215Z 2023-01-11T22:12:51.9020341Z Generating XML reports... 2023-01-11T22:12:51.9020792Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215936.xml 2023-01-11T22:12:51.9021207Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9021395Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9021782Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9021976Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9021995Z 2023-01-11T22:12:51.9022105Z Running tests... 2023-01-11T22:12:51.9022370Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9022685Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9022958Z test_ddp_grad_div_uneven_inputs (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9023708Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78685 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.618s) 2023-01-11T22:12:51.9023729Z 2023-01-11T22:12:51.9023989Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9024086Z Ran 1 test in 1.619s 2023-01-11T22:12:51.9024106Z 2023-01-11T22:12:51.9024216Z OK (skipped=1) 2023-01-11T22:12:51.9024236Z 2023-01-11T22:12:51.9024361Z Generating XML reports... 2023-01-11T22:12:51.9024806Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215945.xml 2023-01-11T22:12:51.9025178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9025357Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9025736Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9025934Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9025953Z 2023-01-11T22:12:51.9026045Z Running tests... 2023-01-11T22:12:51.9026307Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9026615Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9026887Z test_ddp_hook_parity_allreduce (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9027625Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77293 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.619s) 2023-01-11T22:12:51.9027648Z 2023-01-11T22:12:51.9027908Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9028021Z Ran 1 test in 1.619s 2023-01-11T22:12:51.9028044Z 2023-01-11T22:12:51.9028153Z OK (skipped=1) 2023-01-11T22:12:51.9028171Z 2023-01-11T22:12:51.9028298Z Generating XML reports... 2023-01-11T22:12:51.9028740Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215949.xml 2023-01-11T22:12:51.9029090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9029267Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9029716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9029908Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9029928Z 2023-01-11T22:12:51.9030037Z Running tests... 2023-01-11T22:12:51.9030298Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9030654Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9030952Z test_ddp_hook_parity_allreduce_process_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9031154Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30329 2023-01-11T22:12:51.9031406Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30330 2023-01-11T22:12:51.9031787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9031970Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9032352Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9032543Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9032917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9033095Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9033475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9033648Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9033893Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9034144Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9034547Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9034943Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9035177Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9035418Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9035639Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9035875Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9036250Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9036642Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9036921Z [1673474398.195708] [7e0e28e30a97:30329:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9037160Z [1673474398.202740] [7e0e28e30a97:30329:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9037401Z [1673474398.202740] [7e0e28e30a97:30329:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9037676Z [1673474398.196487] [7e0e28e30a97:30330:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9037907Z [1673474398.202450] [7e0e28e30a97:30330:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9038213Z [1673474398.202450] [7e0e28e30a97:30330:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9038452Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9038690Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9038952Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9039198Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9039303Z ok (6.245s) 2023-01-11T22:12:51.9039323Z 2023-01-11T22:12:51.9039596Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9039712Z Ran 1 test in 6.245s 2023-01-11T22:12:51.9039731Z 2023-01-11T22:12:51.9039827Z OK 2023-01-11T22:12:51.9039846Z 2023-01-11T22:12:51.9039973Z Generating XML reports... 2023-01-11T22:12:51.9040424Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215953.xml 2023-01-11T22:12:51.9040779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9040959Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9041343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9041539Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9041558Z 2023-01-11T22:12:51.9041669Z Running tests... 2023-01-11T22:12:51.9041936Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9042248Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9042525Z test_ddp_hook_parity_post_localSGD (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9042748Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30447 2023-01-11T22:12:51.9042948Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30448 2023-01-11T22:12:51.9043321Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9043502Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9043886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9044077Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9044445Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9044620Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9045001Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9045171Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9045416Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9045661Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9046065Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9046462Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9046691Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9046970Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T22:12:51.9047257Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9047528Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T22:12:51.9047787Z [1673474406.936064] [7e0e28e30a97:30447:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9048077Z [1673474406.943522] [7e0e28e30a97:30447:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9048322Z [1673474406.943522] [7e0e28e30a97:30447:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9048596Z [1673474406.940346] [7e0e28e30a97:30448:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9048832Z [1673474406.947596] [7e0e28e30a97:30448:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9049069Z [1673474406.947596] [7e0e28e30a97:30448:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9049307Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9049547Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9049781Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9050012Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9050273Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T22:12:51.9050549Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T22:12:51.9050830Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T22:12:51.9051104Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 10 iterations 2023-01-11T22:12:51.9051338Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9051573Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9051805Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9052035Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9052308Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T22:12:51.9052564Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Start to apply local SGD after 10 iterations. 2023-01-11T22:12:51.9052844Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 1000 iterations 2023-01-11T22:12:51.9053515Z INFO:torch.distributed.algorithms.ddp_comm_hooks.post_localSGD_hook:Local SGD will be started after 1000 iterations 2023-01-11T22:12:51.9053755Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9053994Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9054225Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9054457Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9054561Z ok (6.546s) 2023-01-11T22:12:51.9054581Z 2023-01-11T22:12:51.9054848Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9054962Z Ran 1 test in 6.546s 2023-01-11T22:12:51.9055070Z 2023-01-11T22:12:51.9055170Z OK 2023-01-11T22:12:51.9055189Z 2023-01-11T22:12:51.9055317Z Generating XML reports... 2023-01-11T22:12:51.9055771Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220002.xml 2023-01-11T22:12:51.9056145Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9056384Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9056778Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9056976Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9056996Z 2023-01-11T22:12:51.9057088Z Running tests... 2023-01-11T22:12:51.9057355Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9057667Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9057942Z test_ddp_hook_parity_powerSGD (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9058691Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77378 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.606s) 2023-01-11T22:12:51.9058711Z 2023-01-11T22:12:51.9058973Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9059087Z Ran 1 test in 1.606s 2023-01-11T22:12:51.9059106Z 2023-01-11T22:12:51.9059216Z OK (skipped=1) 2023-01-11T22:12:51.9059235Z 2023-01-11T22:12:51.9059360Z Generating XML reports... 2023-01-11T22:12:51.9059784Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220011.xml 2023-01-11T22:12:51.9060158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9060335Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9060746Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9060940Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9060963Z 2023-01-11T22:12:51.9061076Z Running tests... 2023-01-11T22:12:51.9061339Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9061650Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9061925Z test_ddp_hook_pickling_powerSGD (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9062128Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30599 2023-01-11T22:12:51.9062352Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30600 2023-01-11T22:12:51.9062724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9062901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9063282Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9063473Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9063839Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9064015Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9064389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9064632Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9064876Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9065122Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9065535Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9065982Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9066218Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9066764Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 4; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:12:51.9066999Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9067544Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 4; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:12:51.9067783Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9067999Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9068275Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Start to apply PowerSGD after 4 iterations. 2023-01-11T22:12:51.9068547Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Start to apply PowerSGD after 4 iterations. 2023-01-11T22:12:51.9068849Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:A zero tensor of length 10 that represents local error is created. 2023-01-11T22:12:51.9069147Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:A zero tensor of length 10 that represents local error is created. 2023-01-11T22:12:51.9069479Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Compression stats: iter 4, total before compression 10, total after compression 10, rate 1.0 2023-01-11T22:12:51.9069807Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Allocating contiguous memory of length 0 for Ps, and of length 0 for Qs, respectively. 2023-01-11T22:12:51.9070133Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Compression stats: iter 4, total before compression 10, total after compression 10, rate 1.0 2023-01-11T22:12:51.9070455Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:Allocating contiguous memory of length 0 for Ps, and of length 0 for Qs, respectively. 2023-01-11T22:12:51.9070738Z [1673474420.162929] [7e0e28e30a97:30599:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9070975Z [1673474420.168875] [7e0e28e30a97:30599:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9071204Z [1673474420.168875] [7e0e28e30a97:30599:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9071480Z [1673474420.164591] [7e0e28e30a97:30600:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9071715Z [1673474420.169548] [7e0e28e30a97:30600:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9071954Z [1673474420.169548] [7e0e28e30a97:30600:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9072250Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9072487Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9072591Z ok (6.047s) 2023-01-11T22:12:51.9072611Z 2023-01-11T22:12:51.9072888Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9073051Z Ran 1 test in 6.047s 2023-01-11T22:12:51.9073073Z 2023-01-11T22:12:51.9073151Z OK 2023-01-11T22:12:51.9073169Z 2023-01-11T22:12:51.9073298Z Generating XML reports... 2023-01-11T22:12:51.9073760Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220015.xml 2023-01-11T22:12:51.9074134Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9074318Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9074698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9074891Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9074910Z 2023-01-11T22:12:51.9075021Z Running tests... 2023-01-11T22:12:51.9075289Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9075588Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9075981Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:12:51.9076001Z 2023-01-11T22:12:51.9076261Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9076377Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9076400Z 2023-01-11T22:12:51.9076512Z OK (skipped=1) 2023-01-11T22:12:51.9076532Z 2023-01-11T22:12:51.9076658Z Generating XML reports... 2023-01-11T22:12:51.9077100Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220024.xml 2023-01-11T22:12:51.9077471Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9077651Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9078014Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9078205Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9078225Z 2023-01-11T22:12:51.9078335Z Running tests... 2023-01-11T22:12:51.9078597Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9078911Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9079305Z test_ddp_hook_with_optimizer_parity_adam_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:12:51.9079326Z 2023-01-11T22:12:51.9079585Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9079699Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9079722Z 2023-01-11T22:12:51.9079832Z OK (skipped=1) 2023-01-11T22:12:51.9079851Z 2023-01-11T22:12:51.9079958Z Generating XML reports... 2023-01-11T22:12:51.9080400Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220026.xml 2023-01-11T22:12:51.9080770Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9080945Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9081392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9081586Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9081606Z 2023-01-11T22:12:51.9081715Z Running tests... 2023-01-11T22:12:51.9081975Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9082310Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9082768Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:12:51.9082806Z 2023-01-11T22:12:51.9083050Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9083164Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9083187Z 2023-01-11T22:12:51.9083297Z OK (skipped=1) 2023-01-11T22:12:51.9083316Z 2023-01-11T22:12:51.9083440Z Generating XML reports... 2023-01-11T22:12:51.9083879Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220028.xml 2023-01-11T22:12:51.9084247Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9084428Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9084807Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9084980Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9084999Z 2023-01-11T22:12:51.9085107Z Running tests... 2023-01-11T22:12:51.9085365Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9085673Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9086119Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:12:51.9086139Z 2023-01-11T22:12:51.9086397Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9086514Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9086534Z 2023-01-11T22:12:51.9086642Z OK (skipped=1) 2023-01-11T22:12:51.9086662Z 2023-01-11T22:12:51.9086786Z Generating XML reports... 2023-01-11T22:12:51.9087206Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220031.xml 2023-01-11T22:12:51.9087571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9087752Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9088130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9088323Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9088342Z 2023-01-11T22:12:51.9088452Z Running tests... 2023-01-11T22:12:51.9088716Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9089027Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9089466Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:12:51.9089487Z 2023-01-11T22:12:51.9089743Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9089904Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9089923Z 2023-01-11T22:12:51.9090033Z OK (skipped=1) 2023-01-11T22:12:51.9090053Z 2023-01-11T22:12:51.9090178Z Generating XML reports... 2023-01-11T22:12:51.9090625Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220033.xml 2023-01-11T22:12:51.9090990Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9091217Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9091607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9091799Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9091819Z 2023-01-11T22:12:51.9091910Z Running tests... 2023-01-11T22:12:51.9092169Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9092483Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9093069Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_False_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:12:51.9093091Z 2023-01-11T22:12:51.9093369Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9093487Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9093507Z 2023-01-11T22:12:51.9093617Z OK (skipped=1) 2023-01-11T22:12:51.9093637Z 2023-01-11T22:12:51.9093762Z Generating XML reports... 2023-01-11T22:12:51.9094213Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220036.xml 2023-01-11T22:12:51.9094583Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9094749Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9095127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9095319Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9095338Z 2023-01-11T22:12:51.9095448Z Running tests... 2023-01-11T22:12:51.9095718Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9096030Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9096476Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:12:51.9096496Z 2023-01-11T22:12:51.9096755Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9096872Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9096891Z 2023-01-11T22:12:51.9096982Z OK (skipped=1) 2023-01-11T22:12:51.9097001Z 2023-01-11T22:12:51.9097126Z Generating XML reports... 2023-01-11T22:12:51.9097571Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220038.xml 2023-01-11T22:12:51.9097942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9098119Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9098495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9098687Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9098707Z 2023-01-11T22:12:51.9098815Z Running tests... 2023-01-11T22:12:51.9099058Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9099473Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9099913Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_False_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:12:51.9099935Z 2023-01-11T22:12:51.9100255Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9100374Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9100393Z 2023-01-11T22:12:51.9100503Z OK (skipped=1) 2023-01-11T22:12:51.9100523Z 2023-01-11T22:12:51.9100650Z Generating XML reports... 2023-01-11T22:12:51.9101101Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220040.xml 2023-01-11T22:12:51.9101473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9101655Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9102014Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9102206Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9102225Z 2023-01-11T22:12:51.9102335Z Running tests... 2023-01-11T22:12:51.9102601Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9102912Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9103355Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:12:51.9103375Z 2023-01-11T22:12:51.9103640Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9103754Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9103773Z 2023-01-11T22:12:51.9103883Z OK (skipped=1) 2023-01-11T22:12:51.9103902Z 2023-01-11T22:12:51.9104009Z Generating XML reports... 2023-01-11T22:12:51.9104454Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220043.xml 2023-01-11T22:12:51.9104829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9105006Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9105384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9105578Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9105598Z 2023-01-11T22:12:51.9105709Z Running tests... 2023-01-11T22:12:51.9105980Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9106293Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9106715Z test_ddp_hook_with_optimizer_parity_adamw_grad_as_bucket_view_True_static_graph_True_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:12:51.9106756Z 2023-01-11T22:12:51.9107001Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9107115Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9107135Z 2023-01-11T22:12:51.9107244Z OK (skipped=1) 2023-01-11T22:12:51.9107263Z 2023-01-11T22:12:51.9107388Z Generating XML reports... 2023-01-11T22:12:51.9107833Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220045.xml 2023-01-11T22:12:51.9108201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9108443Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9108831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9109004Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9109043Z 2023-01-11T22:12:51.9109224Z Running tests... 2023-01-11T22:12:51.9109496Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9109807Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9110194Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_False (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:12:51.9110215Z 2023-01-11T22:12:51.9110478Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9110593Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9110613Z 2023-01-11T22:12:51.9110724Z OK (skipped=1) 2023-01-11T22:12:51.9110743Z 2023-01-11T22:12:51.9110868Z Generating XML reports... 2023-01-11T22:12:51.9111290Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220048.xml 2023-01-11T22:12:51.9111662Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9111840Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9112216Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9112409Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9112429Z 2023-01-11T22:12:51.9112537Z Running tests... 2023-01-11T22:12:51.9112803Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9113113Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9113494Z test_ddp_hook_with_optimizer_parity_sgd_optimize_subset_True (__main__.TestDistBackendWithSpawn) ... skip: Issues with async error handling, see https://github.com/pytorch/pytorch/issues/73259 (0.002s) 2023-01-11T22:12:51.9113514Z 2023-01-11T22:12:51.9113756Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9113871Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9113890Z 2023-01-11T22:12:51.9113999Z OK (skipped=1) 2023-01-11T22:12:51.9114018Z 2023-01-11T22:12:51.9114142Z Generating XML reports... 2023-01-11T22:12:51.9114581Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220050.xml 2023-01-11T22:12:51.9114945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9115125Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9115502Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9115692Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9115711Z 2023-01-11T22:12:51.9115802Z Running tests... 2023-01-11T22:12:51.9116065Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9116376Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9116642Z test_ddp_ignore_params_arg (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9117387Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77325 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.626s) 2023-01-11T22:12:51.9117464Z 2023-01-11T22:12:51.9117735Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9117851Z Ran 1 test in 1.627s 2023-01-11T22:12:51.9117871Z 2023-01-11T22:12:51.9117983Z OK (skipped=1) 2023-01-11T22:12:51.9118002Z 2023-01-11T22:12:51.9118129Z Generating XML reports... 2023-01-11T22:12:51.9118623Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220052.xml 2023-01-11T22:12:51.9118988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9119165Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9119546Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9119743Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9119763Z 2023-01-11T22:12:51.9119872Z Running tests... 2023-01-11T22:12:51.9120136Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9120446Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9120708Z test_ddp_inference (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9120910Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31147 2023-01-11T22:12:51.9121129Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31148 2023-01-11T22:12:51.9121500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9121677Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9122058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9122255Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9122622Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9122801Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9123179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9123354Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9123601Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9123848Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9124250Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9124653Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9124883Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9125115Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9125399Z [1673474461.507215] [7e0e28e30a97:31147:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9125635Z [1673474461.512711] [7e0e28e30a97:31147:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9125858Z [1673474461.512711] [7e0e28e30a97:31147:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9126138Z [1673474461.511515] [7e0e28e30a97:31148:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9126432Z [1673474461.517982] [7e0e28e30a97:31148:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9126671Z [1673474461.517982] [7e0e28e30a97:31148:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9126777Z ok (6.145s) 2023-01-11T22:12:51.9126837Z 2023-01-11T22:12:51.9127117Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9127234Z Ran 1 test in 6.145s 2023-01-11T22:12:51.9127253Z 2023-01-11T22:12:51.9127349Z OK 2023-01-11T22:12:51.9127368Z 2023-01-11T22:12:51.9127494Z Generating XML reports... 2023-01-11T22:12:51.9127924Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220056.xml 2023-01-11T22:12:51.9128299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9128484Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9128862Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9129057Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9129077Z 2023-01-11T22:12:51.9129190Z Running tests... 2023-01-11T22:12:51.9129456Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9129772Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9130045Z test_ddp_join_model_equivalence (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9130248Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31261 2023-01-11T22:12:51.9130470Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31262 2023-01-11T22:12:51.9130845Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9131024Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9131405Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9131599Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9131993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9132172Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9132532Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9132723Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9133255Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9133550Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9133962Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9134365Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9134597Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9134825Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9135061Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9135278Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9135654Z [1673474470.661099] [7e0e28e30a97:31262:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9135888Z [1673474470.668413] [7e0e28e30a97:31262:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9136185Z [1673474470.668413] [7e0e28e30a97:31262:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9136459Z [1673474470.656164] [7e0e28e30a97:31261:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9136691Z [1673474470.663352] [7e0e28e30a97:31261:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9136928Z [1673474470.663352] [7e0e28e30a97:31261:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9137346Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T22:12:51.9137514Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T22:12:51.9137912Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T22:12:51.9138057Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T22:12:51.9138165Z ok (5.937s) 2023-01-11T22:12:51.9138186Z 2023-01-11T22:12:51.9138455Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9138570Z Ran 1 test in 5.937s 2023-01-11T22:12:51.9138590Z 2023-01-11T22:12:51.9138684Z OK 2023-01-11T22:12:51.9138703Z 2023-01-11T22:12:51.9138829Z Generating XML reports... 2023-01-11T22:12:51.9139277Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220105.xml 2023-01-11T22:12:51.9139654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9139812Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9140194Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9140389Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9140413Z 2023-01-11T22:12:51.9140524Z Running tests... 2023-01-11T22:12:51.9140787Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9141097Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9141362Z test_ddp_logging_data_cpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9141583Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31379 2023-01-11T22:12:51.9141787Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31380 2023-01-11T22:12:51.9142160Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9142338Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9142713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9142907Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9143274Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9143452Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9143828Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9144017Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9144307Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9144552Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9144959Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9145405Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9145641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9145869Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9146107Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9146343Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9146626Z [1673474478.005038] [7e0e28e30a97:31380:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9146843Z [1673474478.779955] [7e0e28e30a97:31380:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9147086Z [1673474478.779955] [7e0e28e30a97:31380:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9147361Z [1673474477.984686] [7e0e28e30a97:31379:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9147591Z [1673474478.763698] [7e0e28e30a97:31379:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9147827Z [1673474478.763698] [7e0e28e30a97:31379:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9147934Z ok (5.551s) 2023-01-11T22:12:51.9147955Z 2023-01-11T22:12:51.9148230Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9148345Z Ran 1 test in 5.552s 2023-01-11T22:12:51.9148365Z 2023-01-11T22:12:51.9148458Z OK 2023-01-11T22:12:51.9148477Z 2023-01-11T22:12:51.9148584Z Generating XML reports... 2023-01-11T22:12:51.9149032Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220114.xml 2023-01-11T22:12:51.9149404Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9149582Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9149960Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9150153Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9150176Z 2023-01-11T22:12:51.9150287Z Running tests... 2023-01-11T22:12:51.9150552Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9150862Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9151108Z test_ddp_logging_data_gpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9151333Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31523 2023-01-11T22:12:51.9151554Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31524 2023-01-11T22:12:51.9151924Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9152101Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9152478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9152735Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9153105Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9153262Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9153681Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9153875Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9154123Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9154529Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9154770Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9155171Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9155402Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9155631Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9155891Z [1673474486.764525] [7e0e28e30a97:31524:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9156126Z [1673474486.770087] [7e0e28e30a97:31524:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9156367Z [1673474486.770087] [7e0e28e30a97:31524:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9156643Z [1673474486.764174] [7e0e28e30a97:31523:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9156876Z [1673474486.771468] [7e0e28e30a97:31523:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9157112Z [1673474486.771468] [7e0e28e30a97:31523:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9157353Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9157589Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9157693Z ok (5.938s) 2023-01-11T22:12:51.9157715Z 2023-01-11T22:12:51.9157986Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9158082Z Ran 1 test in 5.938s 2023-01-11T22:12:51.9158101Z 2023-01-11T22:12:51.9158194Z OK 2023-01-11T22:12:51.9158213Z 2023-01-11T22:12:51.9158337Z Generating XML reports... 2023-01-11T22:12:51.9158789Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220122.xml 2023-01-11T22:12:51.9159158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9159338Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9159719Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9159911Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9159931Z 2023-01-11T22:12:51.9160022Z Running tests... 2023-01-11T22:12:51.9160286Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9160594Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9160879Z test_ddp_model_diff_num_params_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9161155Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31641 2023-01-11T22:12:51.9161375Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31642 2023-01-11T22:12:51.9161750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9161970Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9162357Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9162530Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9162896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9163072Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9163455Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9163644Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9163889Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9164136Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9164543Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9164939Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9165152Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9165381Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9165625Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9165864Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9166255Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9166644Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9166885Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:12:51.9167125Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:12:51.9167508Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.9167882Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.9168160Z [1673474495.331335] [7e0e28e30a97:31641:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9168394Z [1673474495.336792] [7e0e28e30a97:31641:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9168636Z [1673474495.336792] [7e0e28e30a97:31641:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9168906Z [1673474495.339152] [7e0e28e30a97:31642:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9169137Z [1673474495.344047] [7e0e28e30a97:31642:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9169434Z [1673474495.344047] [7e0e28e30a97:31642:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9169541Z ok (5.541s) 2023-01-11T22:12:51.9169561Z 2023-01-11T22:12:51.9169838Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9169934Z Ran 1 test in 5.541s 2023-01-11T22:12:51.9169973Z 2023-01-11T22:12:51.9170048Z OK 2023-01-11T22:12:51.9170066Z 2023-01-11T22:12:51.9170191Z Generating XML reports... 2023-01-11T22:12:51.9170700Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220130.xml 2023-01-11T22:12:51.9171083Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9171262Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9171640Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9171840Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9171860Z 2023-01-11T22:12:51.9171970Z Running tests... 2023-01-11T22:12:51.9172215Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9172528Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9172812Z test_ddp_model_diff_shape_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9173181Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31761 2023-01-11T22:12:51.9173403Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31762 2023-01-11T22:12:51.9173779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9173956Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9174341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9174530Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9174880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9175059Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9175436Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9175629Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9175874Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9176118Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9176518Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9176923Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9177133Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9177367Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9177608Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9177849Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9178241Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9178630Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9178960Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:12:51.9179205Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:12:51.9179598Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.9180049Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.9180321Z [1673474503.330016] [7e0e28e30a97:31762:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9180554Z [1673474503.335958] [7e0e28e30a97:31762:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9180797Z [1673474503.335958] [7e0e28e30a97:31762:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9181192Z [1673474513.739869] [7e0e28e30a97:31762:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x251e02c0 was not matched 2023-01-11T22:12:51.9181463Z [1673474503.327542] [7e0e28e30a97:31761:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9181697Z [1673474503.333210] [7e0e28e30a97:31761:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9181934Z [1673474503.333210] [7e0e28e30a97:31761:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9182250Z [1673474513.703160] [7e0e28e30a97:31761:1] ucc_schedule.h:189 UCC WARN timeout 10 sec. has expired on req 0x23868500, seq_num 3, TL_UCP, team_id 1, size 2, rank 0, ctx_rank 0: Barrier n/a inplace=0 bytes=0 2023-01-11T22:12:51.9182528Z [1673474513.750027] [7e0e28e30a97:31761:0] mpool.c:55 UCX WARN object 0x239a3c00 {flags:0x20040 recv length 0 host memory} was not returned to mpool ucp_requests 2023-01-11T22:12:51.9182632Z ok (15.535s) 2023-01-11T22:12:51.9182653Z 2023-01-11T22:12:51.9182906Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9183023Z Ran 1 test in 15.535s 2023-01-11T22:12:51.9183043Z 2023-01-11T22:12:51.9183136Z OK 2023-01-11T22:12:51.9183158Z 2023-01-11T22:12:51.9183285Z Generating XML reports... 2023-01-11T22:12:51.9183735Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220138.xml 2023-01-11T22:12:51.9184107Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9184287Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9184669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9184868Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9184888Z 2023-01-11T22:12:51.9184980Z Running tests... 2023-01-11T22:12:51.9185250Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9185568Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9185879Z test_ddp_multiple_nested_unused_params_err_ignore_params (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9186103Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31881 2023-01-11T22:12:51.9186319Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31882 2023-01-11T22:12:51.9186695Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9186936Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9187303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9187494Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9187860Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9188086Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9188474Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9188666Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9188911Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9189157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9189557Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9189932Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9190165Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9190398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9190676Z [1673474521.420229] [7e0e28e30a97:31882:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9190910Z [1673474521.425257] [7e0e28e30a97:31882:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9191152Z [1673474521.425257] [7e0e28e30a97:31882:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9191430Z [1673474521.412647] [7e0e28e30a97:31881:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9191657Z [1673474521.419341] [7e0e28e30a97:31881:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9191898Z [1673474521.419341] [7e0e28e30a97:31881:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9192004Z ok (6.124s) 2023-01-11T22:12:51.9192025Z 2023-01-11T22:12:51.9192277Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9192391Z Ran 1 test in 6.125s 2023-01-11T22:12:51.9192410Z 2023-01-11T22:12:51.9192505Z OK 2023-01-11T22:12:51.9192525Z 2023-01-11T22:12:51.9192650Z Generating XML reports... 2023-01-11T22:12:51.9193099Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220156.xml 2023-01-11T22:12:51.9193475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9193655Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9194036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9194213Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9194251Z 2023-01-11T22:12:51.9194344Z Running tests... 2023-01-11T22:12:51.9194610Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9194923Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9195211Z test_ddp_multiple_nested_unused_params_error (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9195504Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31999 2023-01-11T22:12:51.9195722Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32000 2023-01-11T22:12:51.9196101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9196279Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9196685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9196885Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9197261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9197439Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9197817Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9198012Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9198259Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9198506Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9198910Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9199289Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9199520Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9199749Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9200028Z [1673474529.999256] [7e0e28e30a97:31999:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9200268Z [1673474530.005311] [7e0e28e30a97:31999:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9200508Z [1673474530.005311] [7e0e28e30a97:31999:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9200783Z [1673474530.005062] [7e0e28e30a97:32000:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9201017Z [1673474530.010751] [7e0e28e30a97:32000:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9201255Z [1673474530.010751] [7e0e28e30a97:32000:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9201343Z ok (6.123s) 2023-01-11T22:12:51.9201385Z 2023-01-11T22:12:51.9201637Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9201751Z Ran 1 test in 6.123s 2023-01-11T22:12:51.9201770Z 2023-01-11T22:12:51.9201865Z OK 2023-01-11T22:12:51.9201884Z 2023-01-11T22:12:51.9202011Z Generating XML reports... 2023-01-11T22:12:51.9202465Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220205.xml 2023-01-11T22:12:51.9202839Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9203021Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9203400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9203576Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9203595Z 2023-01-11T22:12:51.9203767Z Running tests... 2023-01-11T22:12:51.9204039Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9204353Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9204613Z test_ddp_namedtuple (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9204834Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32117 2023-01-11T22:12:51.9205098Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32118 2023-01-11T22:12:51.9205488Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9205647Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9206024Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9206217Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9206591Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9206769Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9207149Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9207345Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9207592Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9207837Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9208221Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9208616Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9208852Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9209081Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9209359Z [1673474538.670718] [7e0e28e30a97:32118:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9209597Z [1673474538.677506] [7e0e28e30a97:32118:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9209838Z [1673474538.677506] [7e0e28e30a97:32118:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9210110Z [1673474538.670118] [7e0e28e30a97:32117:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9210346Z [1673474538.677108] [7e0e28e30a97:32117:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9210582Z [1673474538.677108] [7e0e28e30a97:32117:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9210668Z ok (6.031s) 2023-01-11T22:12:51.9210688Z 2023-01-11T22:12:51.9210963Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9211081Z Ran 1 test in 6.032s 2023-01-11T22:12:51.9211101Z 2023-01-11T22:12:51.9211196Z OK 2023-01-11T22:12:51.9211215Z 2023-01-11T22:12:51.9211341Z Generating XML reports... 2023-01-11T22:12:51.9211789Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220214.xml 2023-01-11T22:12:51.9212160Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9212337Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9212770Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9213256Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9213291Z 2023-01-11T22:12:51.9213428Z Running tests... 2023-01-11T22:12:51.9213703Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9214102Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9214373Z test_ddp_new_tensor_in_fwd (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9214596Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32231 2023-01-11T22:12:51.9214814Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32232 2023-01-11T22:12:51.9215189Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9215352Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9215730Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9215922Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9216293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9216468Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9216843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9217035Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9217282Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9217512Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9217913Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9218311Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9218543Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9218774Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9219051Z [1673474547.278967] [7e0e28e30a97:32232:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9219285Z [1673474547.285870] [7e0e28e30a97:32232:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9219529Z [1673474547.285870] [7e0e28e30a97:32232:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9220310Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:12:51.9220588Z [1673474547.269305] [7e0e28e30a97:32231:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9220820Z [1673474547.274727] [7e0e28e30a97:32231:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9221135Z [1673474547.274727] [7e0e28e30a97:32231:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9221951Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:12:51.9222194Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9222412Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9222521Z ok (5.924s) 2023-01-11T22:12:51.9222541Z 2023-01-11T22:12:51.9222820Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9222934Z Ran 1 test in 5.925s 2023-01-11T22:12:51.9222954Z 2023-01-11T22:12:51.9223049Z OK 2023-01-11T22:12:51.9223068Z 2023-01-11T22:12:51.9223196Z Generating XML reports... 2023-01-11T22:12:51.9223650Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220222.xml 2023-01-11T22:12:51.9224020Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9224179Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9224556Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9224749Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9224772Z 2023-01-11T22:12:51.9224882Z Running tests... 2023-01-11T22:12:51.9225150Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9225461Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9225743Z test_ddp_new_tensor_in_fwd_static_graph (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9226492Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78338 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.641s) 2023-01-11T22:12:51.9226513Z 2023-01-11T22:12:51.9226782Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9226898Z Ran 1 test in 1.641s 2023-01-11T22:12:51.9226918Z 2023-01-11T22:12:51.9227008Z OK (skipped=1) 2023-01-11T22:12:51.9227031Z 2023-01-11T22:12:51.9227162Z Generating XML reports... 2023-01-11T22:12:51.9227607Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220231.xml 2023-01-11T22:12:51.9227977Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9228154Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9228537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9228728Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9228747Z 2023-01-11T22:12:51.9228858Z Running tests... 2023-01-11T22:12:51.9229120Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9229413Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9229758Z test_ddp_profiling_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9230504Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77342 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.615s) 2023-01-11T22:12:51.9230525Z 2023-01-11T22:12:51.9230848Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9230967Z Ran 1 test in 1.615s 2023-01-11T22:12:51.9230986Z 2023-01-11T22:12:51.9231097Z OK (skipped=1) 2023-01-11T22:12:51.9231116Z 2023-01-11T22:12:51.9231243Z Generating XML reports... 2023-01-11T22:12:51.9231691Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220235.xml 2023-01-11T22:12:51.9232063Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9232223Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9232631Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9232822Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9232841Z 2023-01-11T22:12:51.9232952Z Running tests... 2023-01-11T22:12:51.9233220Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9233530Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9233807Z test_ddp_profiling_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9234033Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32417 2023-01-11T22:12:51.9234255Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32418 2023-01-11T22:12:51.9234610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9234787Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9235166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9235360Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9235729Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9235907Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9236283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9236472Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9236704Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9236949Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9237350Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9237754Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9237985Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9238211Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9238491Z [1673474564.082382] [7e0e28e30a97:32417:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9238794Z [1673474564.088553] [7e0e28e30a97:32417:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9239037Z [1673474564.088553] [7e0e28e30a97:32417:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9239390Z STAGE:2023-01-11 22:02:44 32417:32417 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9239693Z [1673474564.085270] [7e0e28e30a97:32418:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9239934Z [1673474564.090809] [7e0e28e30a97:32418:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9240174Z [1673474564.090809] [7e0e28e30a97:32418:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9240519Z STAGE:2023-01-11 22:02:44 32418:32418 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9240761Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9241002Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9241341Z STAGE:2023-01-11 22:02:45 32417:32417 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9241676Z STAGE:2023-01-11 22:02:45 32418:32418 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9242028Z STAGE:2023-01-11 22:02:45 32418:32418 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9242356Z STAGE:2023-01-11 22:02:45 32417:32417 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9243133Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:12:51.9243928Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:12:51.9244267Z STAGE:2023-01-11 22:02:45 32417:32417 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9244579Z STAGE:2023-01-11 22:02:45 32418:32418 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9244913Z STAGE:2023-01-11 22:02:45 32417:32417 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9245260Z STAGE:2023-01-11 22:02:45 32417:32417 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9245592Z STAGE:2023-01-11 22:02:45 32418:32418 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9245936Z STAGE:2023-01-11 22:02:45 32418:32418 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9246043Z ok (6.666s) 2023-01-11T22:12:51.9246064Z 2023-01-11T22:12:51.9246332Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9246446Z Ran 1 test in 6.666s 2023-01-11T22:12:51.9246466Z 2023-01-11T22:12:51.9246561Z OK 2023-01-11T22:12:51.9246581Z 2023-01-11T22:12:51.9246689Z Generating XML reports... 2023-01-11T22:12:51.9247247Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220239.xml 2023-01-11T22:12:51.9247618Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9247799Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9248230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9248429Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9248449Z 2023-01-11T22:12:51.9248561Z Running tests... 2023-01-11T22:12:51.9248834Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9249128Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9249399Z test_ddp_python_error_logged (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9249628Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32539 2023-01-11T22:12:51.9249848Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32540 2023-01-11T22:12:51.9250222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9250404Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9250786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9250979Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9251348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9251505Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9251886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9252078Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9252323Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9252569Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9253139Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9253552Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9253784Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9253995Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9254276Z [1673474573.179981] [7e0e28e30a97:32539:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9254514Z [1673474573.187355] [7e0e28e30a97:32539:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9254761Z [1673474573.187355] [7e0e28e30a97:32539:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9255036Z [1673474573.188650] [7e0e28e30a97:32540:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9255267Z [1673474573.194453] [7e0e28e30a97:32540:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9255506Z [1673474573.194453] [7e0e28e30a97:32540:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9255699Z ok (5.435s) 2023-01-11T22:12:51.9255720Z 2023-01-11T22:12:51.9255995Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9256112Z Ran 1 test in 5.435s 2023-01-11T22:12:51.9256131Z 2023-01-11T22:12:51.9256207Z OK 2023-01-11T22:12:51.9256226Z 2023-01-11T22:12:51.9256354Z Generating XML reports... 2023-01-11T22:12:51.9256872Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220248.xml 2023-01-11T22:12:51.9257263Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9257441Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9257818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9258012Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9258036Z 2023-01-11T22:12:51.9258148Z Running tests... 2023-01-11T22:12:51.9258417Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9258711Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9258991Z test_ddp_returns_tensor_with_no_grad (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9259745Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78595 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.628s) 2023-01-11T22:12:51.9259767Z 2023-01-11T22:12:51.9260032Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9260146Z Ran 1 test in 1.629s 2023-01-11T22:12:51.9260166Z 2023-01-11T22:12:51.9260277Z OK (skipped=1) 2023-01-11T22:12:51.9260300Z 2023-01-11T22:12:51.9260429Z Generating XML reports... 2023-01-11T22:12:51.9260875Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220256.xml 2023-01-11T22:12:51.9261245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9261406Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9261787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9261983Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9262003Z 2023-01-11T22:12:51.9262112Z Running tests... 2023-01-11T22:12:51.9262378Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9262691Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9262976Z test_ddp_shared_grad_acc_unused_params (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9263198Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32687 2023-01-11T22:12:51.9263416Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32688 2023-01-11T22:12:51.9263771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9263952Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9264335Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9264526Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9264889Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9265125Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9265508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9265701Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9265926Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9266216Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9266633Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9267029Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9267261Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9267493Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9268405Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:12:51.9268522Z warnings.warn( 2023-01-11T22:12:51.9269425Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:12:51.9269538Z warnings.warn( 2023-01-11T22:12:51.9269784Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9270003Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9270280Z [1673474585.321603] [7e0e28e30a97:32688:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9270518Z [1673474585.326478] [7e0e28e30a97:32688:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9270760Z [1673474585.326478] [7e0e28e30a97:32688:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9271032Z [1673474585.315938] [7e0e28e30a97:32687:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9271263Z [1673474585.321844] [7e0e28e30a97:32687:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9271504Z [1673474585.321844] [7e0e28e30a97:32687:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9271607Z ok (5.831s) 2023-01-11T22:12:51.9271627Z 2023-01-11T22:12:51.9271901Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9272014Z Ran 1 test in 5.832s 2023-01-11T22:12:51.9272034Z 2023-01-11T22:12:51.9272112Z OK 2023-01-11T22:12:51.9272131Z 2023-01-11T22:12:51.9272258Z Generating XML reports... 2023-01-11T22:12:51.9272711Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220300.xml 2023-01-11T22:12:51.9273084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9273263Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9273713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9274860Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9274882Z 2023-01-11T22:12:51.9274993Z Running tests... 2023-01-11T22:12:51.9275269Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9275628Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9275913Z test_ddp_static_graph_nested_types (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9276669Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77625 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.625s) 2023-01-11T22:12:51.9276695Z 2023-01-11T22:12:51.9276963Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9277079Z Ran 1 test in 1.625s 2023-01-11T22:12:51.9277098Z 2023-01-11T22:12:51.9277209Z OK (skipped=1) 2023-01-11T22:12:51.9277229Z 2023-01-11T22:12:51.9277356Z Generating XML reports... 2023-01-11T22:12:51.9277805Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220309.xml 2023-01-11T22:12:51.9278180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9278338Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9278721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9278915Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9278934Z 2023-01-11T22:12:51.9279046Z Running tests... 2023-01-11T22:12:51.9279315Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9279627Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9279899Z test_ddp_sync_bn_training_vs_eval (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9280121Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32839 2023-01-11T22:12:51.9280342Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32840 2023-01-11T22:12:51.9280695Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9280876Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9281256Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9281448Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9281822Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9282000Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9282379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9282575Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9282806Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9283052Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9283457Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9283858Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9284152Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9284381Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9284661Z [1673474597.909041] [7e0e28e30a97:32839:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9284944Z [1673474597.916027] [7e0e28e30a97:32839:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9285190Z [1673474597.916027] [7e0e28e30a97:32839:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9285538Z STAGE:2023-01-11 22:03:18 32839:32839 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9285796Z [1673474597.918741] [7e0e28e30a97:32840:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9286032Z [1673474597.924468] [7e0e28e30a97:32840:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9286274Z [1673474597.924468] [7e0e28e30a97:32840:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9286617Z STAGE:2023-01-11 22:03:18 32840:32840 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9286857Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9287088Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:12:51.9287430Z STAGE:2023-01-11 22:03:18 32839:32839 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9287763Z STAGE:2023-01-11 22:03:18 32840:32840 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9288118Z STAGE:2023-01-11 22:03:18 32840:32840 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9288446Z STAGE:2023-01-11 22:03:18 32839:32839 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9288775Z STAGE:2023-01-11 22:03:18 32839:32839 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9289105Z STAGE:2023-01-11 22:03:19 32839:32839 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9289448Z STAGE:2023-01-11 22:03:19 32839:32839 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9289555Z ok (6.742s) 2023-01-11T22:12:51.9289574Z 2023-01-11T22:12:51.9289840Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9289956Z Ran 1 test in 6.742s 2023-01-11T22:12:51.9289975Z 2023-01-11T22:12:51.9290071Z OK 2023-01-11T22:12:51.9290093Z 2023-01-11T22:12:51.9290220Z Generating XML reports... 2023-01-11T22:12:51.9290653Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220313.xml 2023-01-11T22:12:51.9291028Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9291208Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9291594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9291790Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9291810Z 2023-01-11T22:12:51.9291924Z Running tests... 2023-01-11T22:12:51.9292194Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9292508Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9292823Z test_ddp_sync_module_states (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9293229Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32961 2023-01-11T22:12:51.9293448Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32962 2023-01-11T22:12:51.9293832Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9294105Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9294505Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9294697Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9295068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9295243Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9295609Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9295801Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9296047Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9296297Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9296697Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9297095Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9297326Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9297561Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9297843Z [1673474607.179699] [7e0e28e30a97:32961:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9298060Z [1673474607.186506] [7e0e28e30a97:32961:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9298303Z [1673474607.186506] [7e0e28e30a97:32961:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9298578Z [1673474607.182092] [7e0e28e30a97:32962:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9298812Z [1673474607.186966] [7e0e28e30a97:32962:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9299051Z [1673474607.186966] [7e0e28e30a97:32962:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9299161Z ok (5.518s) 2023-01-11T22:12:51.9299181Z 2023-01-11T22:12:51.9299457Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9299573Z Ran 1 test in 5.518s 2023-01-11T22:12:51.9299592Z 2023-01-11T22:12:51.9299691Z OK 2023-01-11T22:12:51.9299710Z 2023-01-11T22:12:51.9299818Z Generating XML reports... 2023-01-11T22:12:51.9300267Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220322.xml 2023-01-11T22:12:51.9300642Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9300822Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9301205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9301398Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9301493Z 2023-01-11T22:12:51.9301611Z Running tests... 2023-01-11T22:12:51.9301885Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9302199Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9302454Z test_ddp_uneven_input_exception (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9302727Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33075 2023-01-11T22:12:51.9302952Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33076 2023-01-11T22:12:51.9303331Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9303509Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9303890Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9304087Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9304456Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9304613Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9304996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9305189Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9305436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9305681Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9306080Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9306481Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9306716Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9306943Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9307204Z [1673474615.223657] [7e0e28e30a97:33076:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9307443Z [1673474615.229142] [7e0e28e30a97:33076:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9307684Z [1673474615.229142] [7e0e28e30a97:33076:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9307959Z [1673474615.223680] [7e0e28e30a97:33075:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9308195Z [1673474615.229502] [7e0e28e30a97:33075:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9308433Z [1673474615.229502] [7e0e28e30a97:33075:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9308540Z ok (5.423s) 2023-01-11T22:12:51.9308560Z 2023-01-11T22:12:51.9308830Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9308946Z Ran 1 test in 5.423s 2023-01-11T22:12:51.9308966Z 2023-01-11T22:12:51.9309062Z OK 2023-01-11T22:12:51.9309082Z 2023-01-11T22:12:51.9309190Z Generating XML reports... 2023-01-11T22:12:51.9309636Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220330.xml 2023-01-11T22:12:51.9310011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9310254Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9310643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9310837Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9310857Z 2023-01-11T22:12:51.9310970Z Running tests... 2023-01-11T22:12:51.9311285Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9311596Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9311877Z test_ddp_uneven_input_join_disable (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9312626Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78684 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.611s) 2023-01-11T22:12:51.9312651Z 2023-01-11T22:12:51.9312917Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9313029Z Ran 1 test in 1.612s 2023-01-11T22:12:51.9313048Z 2023-01-11T22:12:51.9313159Z OK (skipped=1) 2023-01-11T22:12:51.9313178Z 2023-01-11T22:12:51.9313305Z Generating XML reports... 2023-01-11T22:12:51.9313757Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220338.xml 2023-01-11T22:12:51.9314131Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9314311Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9314669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9314865Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9314885Z 2023-01-11T22:12:51.9314996Z Running tests... 2023-01-11T22:12:51.9315261Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9315576Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9315838Z test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9316580Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/75648 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.629s) 2023-01-11T22:12:51.9316600Z 2023-01-11T22:12:51.9316863Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9316982Z Ran 1 test in 1.629s 2023-01-11T22:12:51.9317002Z 2023-01-11T22:12:51.9317111Z OK (skipped=1) 2023-01-11T22:12:51.9317131Z 2023-01-11T22:12:51.9317238Z Generating XML reports... 2023-01-11T22:12:51.9317686Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220342.xml 2023-01-11T22:12:51.9318056Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9318237Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9318620Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9318813Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9318833Z 2023-01-11T22:12:51.9318942Z Running tests... 2023-01-11T22:12:51.9319206Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9319576Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9319863Z test_ddp_uneven_inputs_stop_iteration_sync_bn (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9320653Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78113 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.613s) 2023-01-11T22:12:51.9320676Z 2023-01-11T22:12:51.9320944Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9321057Z Ran 1 test in 1.613s 2023-01-11T22:12:51.9321076Z 2023-01-11T22:12:51.9321185Z OK (skipped=1) 2023-01-11T22:12:51.9321205Z 2023-01-11T22:12:51.9321331Z Generating XML reports... 2023-01-11T22:12:51.9321775Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220346.xml 2023-01-11T22:12:51.9322151Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9322331Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9322689Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9322887Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9322906Z 2023-01-11T22:12:51.9323017Z Running tests... 2023-01-11T22:12:51.9323279Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9323591Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9323890Z test_ddp_unused_params_rebuild_buckets_exception (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9324115Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33291 2023-01-11T22:12:51.9324334Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33292 2023-01-11T22:12:51.9324687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9324867Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9325249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9325444Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9325815Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9325994Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9326374Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9326567Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9326813Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9327039Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9327445Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9327845Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9328079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9328309Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9328589Z [1673474635.628785] [7e0e28e30a97:33292:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9328892Z [1673474635.634952] [7e0e28e30a97:33292:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9329136Z [1673474635.634952] [7e0e28e30a97:33292:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9329450Z [1673474635.628397] [7e0e28e30a97:33291:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9329686Z [1673474635.633697] [7e0e28e30a97:33291:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9329905Z [1673474635.633697] [7e0e28e30a97:33291:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9330011Z ok (5.915s) 2023-01-11T22:12:51.9330034Z 2023-01-11T22:12:51.9330314Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9330429Z Ran 1 test in 5.915s 2023-01-11T22:12:51.9330449Z 2023-01-11T22:12:51.9330544Z OK 2023-01-11T22:12:51.9330564Z 2023-01-11T22:12:51.9330689Z Generating XML reports... 2023-01-11T22:12:51.9331136Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220351.xml 2023-01-11T22:12:51.9331516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9331675Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9332057Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9332251Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9332270Z 2023-01-11T22:12:51.9332381Z Running tests... 2023-01-11T22:12:51.9332650Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9333146Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9333421Z test_ddp_zero_output_features (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9333645Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33409 2023-01-11T22:12:51.9333867Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33410 2023-01-11T22:12:51.9334225Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9334400Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9334783Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9334976Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9335352Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9335531Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9335910Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9336106Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9336333Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9336578Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9336979Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9337377Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9337690Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9338077Z /opt/conda/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op 2023-01-11T22:12:51.9338333Z warnings.warn("Initializing zero-element tensors is a no-op") 2023-01-11T22:12:51.9338627Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9339021Z /opt/conda/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op 2023-01-11T22:12:51.9339256Z warnings.warn("Initializing zero-element tensors is a no-op") 2023-01-11T22:12:51.9339532Z [1673474644.154242] [7e0e28e30a97:33410:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9339768Z [1673474644.160666] [7e0e28e30a97:33410:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9340016Z [1673474644.160666] [7e0e28e30a97:33410:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9340287Z [1673474644.147426] [7e0e28e30a97:33409:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9340523Z [1673474644.153219] [7e0e28e30a97:33409:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9340767Z [1673474644.153219] [7e0e28e30a97:33409:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9340873Z ok (5.540s) 2023-01-11T22:12:51.9340894Z 2023-01-11T22:12:51.9341168Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9341283Z Ran 1 test in 5.540s 2023-01-11T22:12:51.9341306Z 2023-01-11T22:12:51.9341382Z OK 2023-01-11T22:12:51.9341401Z 2023-01-11T22:12:51.9341528Z Generating XML reports... 2023-01-11T22:12:51.9341978Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220359.xml 2023-01-11T22:12:51.9342348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9342529Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9342912Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9343106Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9343126Z 2023-01-11T22:12:51.9343237Z Running tests... 2023-01-11T22:12:51.9343484Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9343797Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9344064Z test_destroy_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9344286Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33523 2023-01-11T22:12:51.9344506Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33524 2023-01-11T22:12:51.9344882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9345064Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9345447Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9345637Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9345987Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9346232Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9346617Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9346811Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9347054Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9347345Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9347758Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9348154Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9348384Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9348610Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9348834Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9349069Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9349468Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9349858Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9349964Z ok (4.429s) 2023-01-11T22:12:51.9349984Z 2023-01-11T22:12:51.9350255Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9350370Z Ran 1 test in 4.429s 2023-01-11T22:12:51.9350390Z 2023-01-11T22:12:51.9350465Z OK 2023-01-11T22:12:51.9350504Z 2023-01-11T22:12:51.9350614Z Generating XML reports... 2023-01-11T22:12:51.9351063Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220407.xml 2023-01-11T22:12:51.9351436Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9351617Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9352002Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9352195Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9352214Z 2023-01-11T22:12:51.9352326Z Running tests... 2023-01-11T22:12:51.9352595Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9352892Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9353149Z test_destroy_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9353375Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33626 2023-01-11T22:12:51.9353594Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33627 2023-01-11T22:12:51.9353967Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9354148Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9354530Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9354722Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9355067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9355245Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9355700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9355893Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9356142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9356388Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9356847Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9357255Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9357487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9357708Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9357937Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9358173Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9358569Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9358963Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9359068Z ok (4.335s) 2023-01-11T22:12:51.9359088Z 2023-01-11T22:12:51.9359355Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9359469Z Ran 1 test in 4.335s 2023-01-11T22:12:51.9359489Z 2023-01-11T22:12:51.9359586Z OK 2023-01-11T22:12:51.9359605Z 2023-01-11T22:12:51.9359712Z Generating XML reports... 2023-01-11T22:12:51.9360159Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220414.xml 2023-01-11T22:12:51.9360536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9360716Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9361098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9361330Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9361350Z 2023-01-11T22:12:51.9361463Z Running tests... 2023-01-11T22:12:51.9361733Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9362047Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9362305Z test_detect_ddp_is_actually_static (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9363054Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78767 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.615s) 2023-01-11T22:12:51.9363075Z 2023-01-11T22:12:51.9363345Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9363461Z Ran 1 test in 1.615s 2023-01-11T22:12:51.9363481Z 2023-01-11T22:12:51.9363592Z OK (skipped=1) 2023-01-11T22:12:51.9363611Z 2023-01-11T22:12:51.9363740Z Generating XML reports... 2023-01-11T22:12:51.9364188Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220421.xml 2023-01-11T22:12:51.9364560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9364737Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9365173Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9365368Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9365387Z 2023-01-11T22:12:51.9365499Z Running tests... 2023-01-11T22:12:51.9365767Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9366126Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9366410Z test_different_graph_across_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9367155Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78748 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.619s) 2023-01-11T22:12:51.9367179Z 2023-01-11T22:12:51.9367445Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9367558Z Ran 1 test in 1.620s 2023-01-11T22:12:51.9367577Z 2023-01-11T22:12:51.9367688Z OK (skipped=1) 2023-01-11T22:12:51.9367708Z 2023-01-11T22:12:51.9367816Z Generating XML reports... 2023-01-11T22:12:51.9368266Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220425.xml 2023-01-11T22:12:51.9368636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9368812Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9369196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9369388Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9369411Z 2023-01-11T22:12:51.9369522Z Running tests... 2023-01-11T22:12:51.9369785Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9370075Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9370352Z test_dump_DDP_relevant_env_vars (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9370576Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33797 2023-01-11T22:12:51.9370794Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33798 2023-01-11T22:12:51.9371166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9371347Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9371725Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9371921Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9372288Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9372447Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9372829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9373198Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9373450Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9373697Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9374103Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9374598Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9374828Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9375056Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9375143Z ok (4.243s) 2023-01-11T22:12:51.9375163Z 2023-01-11T22:12:51.9375492Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9375612Z Ran 1 test in 4.244s 2023-01-11T22:12:51.9375632Z 2023-01-11T22:12:51.9375726Z OK 2023-01-11T22:12:51.9375745Z 2023-01-11T22:12:51.9375872Z Generating XML reports... 2023-01-11T22:12:51.9376328Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220429.xml 2023-01-11T22:12:51.9376701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9376884Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9377245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9377439Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9377458Z 2023-01-11T22:12:51.9377570Z Running tests... 2023-01-11T22:12:51.9377841Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9378155Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9378415Z test_gather (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:12:51.9378435Z 2023-01-11T22:12:51.9378694Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9378809Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9378829Z 2023-01-11T22:12:51.9378944Z OK (skipped=1) 2023-01-11T22:12:51.9378964Z 2023-01-11T22:12:51.9379071Z Generating XML reports... 2023-01-11T22:12:51.9379518Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220436.xml 2023-01-11T22:12:51.9379889Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9380073Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9380450Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9380640Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9380660Z 2023-01-11T22:12:51.9380769Z Running tests... 2023-01-11T22:12:51.9381030Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9381321Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9381595Z test_gather_checks (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:12:51.9381615Z 2023-01-11T22:12:51.9381881Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9381994Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9382014Z 2023-01-11T22:12:51.9382121Z OK (skipped=1) 2023-01-11T22:12:51.9382140Z 2023-01-11T22:12:51.9382270Z Generating XML reports... 2023-01-11T22:12:51.9382718Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220438.xml 2023-01-11T22:12:51.9383090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9383268Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9383626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9383885Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9383904Z 2023-01-11T22:12:51.9384013Z Running tests... 2023-01-11T22:12:51.9384283Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9384596Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9384930Z test_gather_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA gather (0.002s) 2023-01-11T22:12:51.9384953Z 2023-01-11T22:12:51.9385226Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9385342Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9385361Z 2023-01-11T22:12:51.9385472Z OK (skipped=1) 2023-01-11T22:12:51.9385492Z 2023-01-11T22:12:51.9385600Z Generating XML reports... 2023-01-11T22:12:51.9386044Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220441.xml 2023-01-11T22:12:51.9386420Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9386600Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9386980Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9387175Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9387196Z 2023-01-11T22:12:51.9387307Z Running tests... 2023-01-11T22:12:51.9387573Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9387886Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9388137Z test_gather_full_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:12:51.9388157Z 2023-01-11T22:12:51.9388426Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9388540Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9388560Z 2023-01-11T22:12:51.9388670Z OK (skipped=1) 2023-01-11T22:12:51.9388690Z 2023-01-11T22:12:51.9388816Z Generating XML reports... 2023-01-11T22:12:51.9389263Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220443.xml 2023-01-11T22:12:51.9389636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9389814Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9390191Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9390363Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9390382Z 2023-01-11T22:12:51.9390493Z Running tests... 2023-01-11T22:12:51.9390762Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9391076Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9391343Z test_gather_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:12:51.9391363Z 2023-01-11T22:12:51.9391627Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9391745Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9391764Z 2023-01-11T22:12:51.9391877Z OK (skipped=1) 2023-01-11T22:12:51.9391896Z 2023-01-11T22:12:51.9392003Z Generating XML reports... 2023-01-11T22:12:51.9392443Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220446.xml 2023-01-11T22:12:51.9392811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9392990Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9393437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9393627Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9393646Z 2023-01-11T22:12:51.9393756Z Running tests... 2023-01-11T22:12:51.9394018Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9394376Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9394633Z test_gather_object (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:12:51.9394653Z 2023-01-11T22:12:51.9394919Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9395033Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9395053Z 2023-01-11T22:12:51.9395163Z OK (skipped=1) 2023-01-11T22:12:51.9395186Z 2023-01-11T22:12:51.9395314Z Generating XML reports... 2023-01-11T22:12:51.9395759Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220448.xml 2023-01-11T22:12:51.9396129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9396307Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9396692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9396865Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9396884Z 2023-01-11T22:12:51.9396996Z Running tests... 2023-01-11T22:12:51.9397261Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9397575Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9397858Z test_gather_object_subgroup (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:12:51.9397878Z 2023-01-11T22:12:51.9398138Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9398252Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9398271Z 2023-01-11T22:12:51.9398381Z OK (skipped=1) 2023-01-11T22:12:51.9398400Z 2023-01-11T22:12:51.9398509Z Generating XML reports... 2023-01-11T22:12:51.9398951Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220450.xml 2023-01-11T22:12:51.9399322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9399502Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9399880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9400076Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9400095Z 2023-01-11T22:12:51.9400206Z Running tests... 2023-01-11T22:12:51.9400471Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9400781Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9401013Z test_get_backend (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9401236Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34131 2023-01-11T22:12:51.9401459Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34132 2023-01-11T22:12:51.9401828Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9402006Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9402385Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9402636Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9403009Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9403165Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9403588Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9403784Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9404029Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9404274Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9404682Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9405085Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9405316Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9405556Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9405764Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9405998Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9406398Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9406789Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9406898Z ok (4.245s) 2023-01-11T22:12:51.9406918Z 2023-01-11T22:12:51.9407183Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9407297Z Ran 1 test in 4.245s 2023-01-11T22:12:51.9407316Z 2023-01-11T22:12:51.9407411Z OK 2023-01-11T22:12:51.9407430Z 2023-01-11T22:12:51.9407557Z Generating XML reports... 2023-01-11T22:12:51.9407989Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220453.xml 2023-01-11T22:12:51.9408359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9408538Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9408920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9409113Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9409136Z 2023-01-11T22:12:51.9409248Z Running tests... 2023-01-11T22:12:51.9409512Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9409824Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9410099Z test_get_future (__main__.TestDistBackendWithSpawn) ... skip: get_future is only supported on mpi, nccl and gloo (0.002s) 2023-01-11T22:12:51.9410120Z 2023-01-11T22:12:51.9410364Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9410480Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9410499Z 2023-01-11T22:12:51.9410609Z OK (skipped=1) 2023-01-11T22:12:51.9410628Z 2023-01-11T22:12:51.9410755Z Generating XML reports... 2023-01-11T22:12:51.9411199Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220500.xml 2023-01-11T22:12:51.9411568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9411812Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9412195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9412369Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9412408Z 2023-01-11T22:12:51.9412499Z Running tests... 2023-01-11T22:12:51.9412811Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9413415Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9413714Z test_get_rank (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9413935Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34267 2023-01-11T22:12:51.9414154Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34268 2023-01-11T22:12:51.9414539Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9414715Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9415074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9415267Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9415638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9415815Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9416192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9416381Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9416626Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9416876Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9417258Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9417657Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9417888Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9418117Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9418221Z ok (4.462s) 2023-01-11T22:12:51.9418242Z 2023-01-11T22:12:51.9418510Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9418624Z Ran 1 test in 4.462s 2023-01-11T22:12:51.9418643Z 2023-01-11T22:12:51.9418742Z OK 2023-01-11T22:12:51.9418761Z 2023-01-11T22:12:51.9418887Z Generating XML reports... 2023-01-11T22:12:51.9419314Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220502.xml 2023-01-11T22:12:51.9419685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9419860Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9420241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9420434Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9420453Z 2023-01-11T22:12:51.9420565Z Running tests... 2023-01-11T22:12:51.9420830Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9421143Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9421508Z test_get_rank_size_full_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9421710Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34370 2023-01-11T22:12:51.9421931Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34371 2023-01-11T22:12:51.9422382Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9422566Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9422954Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9423145Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9423508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9423686Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9424042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9424232Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9424477Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9424726Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9425125Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9425521Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9425748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9425992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9426214Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9426431Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9426827Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9427214Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9427317Z ok (4.334s) 2023-01-11T22:12:51.9427338Z 2023-01-11T22:12:51.9427604Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9427718Z Ran 1 test in 4.334s 2023-01-11T22:12:51.9427738Z 2023-01-11T22:12:51.9427833Z OK 2023-01-11T22:12:51.9427852Z 2023-01-11T22:12:51.9427978Z Generating XML reports... 2023-01-11T22:12:51.9428426Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220509.xml 2023-01-11T22:12:51.9428781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9428960Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9429343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9429537Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9429556Z 2023-01-11T22:12:51.9429670Z Running tests... 2023-01-11T22:12:51.9429936Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9430249Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9430512Z test_get_rank_size_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9430779Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34473 2023-01-11T22:12:51.9430995Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34474 2023-01-11T22:12:51.9431370Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9431600Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9431993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9432185Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9432551Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9432727Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9433110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9433283Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9433555Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9433801Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9434209Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9434609Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9434842Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9435083Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9435311Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9435547Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9435923Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9436322Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9436427Z ok (4.336s) 2023-01-11T22:12:51.9436446Z 2023-01-11T22:12:51.9436714Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9436828Z Ran 1 test in 4.336s 2023-01-11T22:12:51.9436848Z 2023-01-11T22:12:51.9436943Z OK 2023-01-11T22:12:51.9436962Z 2023-01-11T22:12:51.9437089Z Generating XML reports... 2023-01-11T22:12:51.9437537Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220516.xml 2023-01-11T22:12:51.9437895Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9438075Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9438457Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9438656Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9438676Z 2023-01-11T22:12:51.9438788Z Running tests... 2023-01-11T22:12:51.9439054Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9439368Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9439636Z test_invalid_static_graph (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9439859Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34576 2023-01-11T22:12:51.9440123Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34577 2023-01-11T22:12:51.9440501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9440680Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9441106Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9441302Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9441674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9441853Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9442233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9442410Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9442656Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9442900Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9443306Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9443703Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9443932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9444160Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9444440Z [1673474727.896142] [7e0e28e30a97:34577:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9444683Z [1673474727.901576] [7e0e28e30a97:34577:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9444926Z [1673474727.901576] [7e0e28e30a97:34577:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9445186Z [1673474727.887465] [7e0e28e30a97:34576:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9445420Z [1673474727.893457] [7e0e28e30a97:34576:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9445660Z [1673474727.893457] [7e0e28e30a97:34576:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9445766Z ok (5.931s) 2023-01-11T22:12:51.9445786Z 2023-01-11T22:12:51.9446063Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9446178Z Ran 1 test in 5.931s 2023-01-11T22:12:51.9446198Z 2023-01-11T22:12:51.9446292Z OK 2023-01-11T22:12:51.9446312Z 2023-01-11T22:12:51.9446439Z Generating XML reports... 2023-01-11T22:12:51.9446889Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220523.xml 2023-01-11T22:12:51.9447246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9447427Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9447811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9448005Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9448023Z 2023-01-11T22:12:51.9448134Z Running tests... 2023-01-11T22:12:51.9448470Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9448783Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9449023Z test_irecv (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9449224Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34694 2023-01-11T22:12:51.9449488Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34695 2023-01-11T22:12:51.9449875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9450054Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9450433Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9450624Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9450999Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9451176Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9451553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9451727Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9451976Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9452224Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9452626Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9453190Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9453432Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9453663Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9453944Z [1673474735.666508] [7e0e28e30a97:34695:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9454184Z [1673474736.458512] [7e0e28e30a97:34695:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9454407Z [1673474736.458512] [7e0e28e30a97:34695:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9454680Z [1673474735.659249] [7e0e28e30a97:34694:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9454911Z [1673474736.458665] [7e0e28e30a97:34694:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9455155Z [1673474736.458665] [7e0e28e30a97:34694:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9455264Z ok (5.431s) 2023-01-11T22:12:51.9455284Z 2023-01-11T22:12:51.9455561Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9455678Z Ran 1 test in 5.432s 2023-01-11T22:12:51.9455701Z 2023-01-11T22:12:51.9455797Z OK 2023-01-11T22:12:51.9455816Z 2023-01-11T22:12:51.9455942Z Generating XML reports... 2023-01-11T22:12:51.9456370Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220531.xml 2023-01-11T22:12:51.9456743Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9456922Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9457395Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9457592Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9457612Z 2023-01-11T22:12:51.9457723Z Running tests... 2023-01-11T22:12:51.9457991Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9458362Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9458613Z test_isend (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9458817Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34804 2023-01-11T22:12:51.9459035Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34805 2023-01-11T22:12:51.9459415Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9459598Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9459978Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9460170Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9460532Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9460711Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9461070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9461261Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9461509Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9461755Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9462164Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9462558Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9462784Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9463016Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9463293Z [1673474743.585493] [7e0e28e30a97:34804:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9463510Z [1673474744.380814] [7e0e28e30a97:34804:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9463751Z [1673474744.380814] [7e0e28e30a97:34804:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9464030Z [1673474743.593487] [7e0e28e30a97:34805:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9464264Z [1673474744.395102] [7e0e28e30a97:34805:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9464503Z [1673474744.395102] [7e0e28e30a97:34805:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9464610Z ok (5.409s) 2023-01-11T22:12:51.9464631Z 2023-01-11T22:12:51.9464903Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9465017Z Ran 1 test in 5.409s 2023-01-11T22:12:51.9465037Z 2023-01-11T22:12:51.9465132Z OK 2023-01-11T22:12:51.9465151Z 2023-01-11T22:12:51.9465259Z Generating XML reports... 2023-01-11T22:12:51.9465707Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220539.xml 2023-01-11T22:12:51.9466152Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9466332Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9466716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9466957Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9466978Z 2023-01-11T22:12:51.9467092Z Running tests... 2023-01-11T22:12:51.9467366Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9467680Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9467932Z test_isend_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9468159Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34914 2023-01-11T22:12:51.9468378Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34915 2023-01-11T22:12:51.9468749Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9468996Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9469380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9469573Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9469941Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9470099Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9470477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9470671Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9470916Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9471162Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9471567Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9471965Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9472195Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9472534Z STAGE:2023-01-11 22:05:51 34914:34914 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9472748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9473084Z STAGE:2023-01-11 22:05:51 34915:34915 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9473363Z [1673474751.650355] [7e0e28e30a97:34914:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9473601Z [1673474752.706794] [7e0e28e30a97:34914:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9473844Z [1673474752.706794] [7e0e28e30a97:34914:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9474187Z STAGE:2023-01-11 22:05:53 34914:34914 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9474461Z [1673474751.671748] [7e0e28e30a97:34915:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9474749Z [1673474752.680852] [7e0e28e30a97:34915:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9474989Z [1673474752.680852] [7e0e28e30a97:34915:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9475337Z STAGE:2023-01-11 22:05:53 34915:34915 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9475711Z STAGE:2023-01-11 22:05:53 34914:34914 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9476075Z STAGE:2023-01-11 22:05:53 34915:34915 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9476178Z ok (6.041s) 2023-01-11T22:12:51.9476198Z 2023-01-11T22:12:51.9476464Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9476579Z Ran 1 test in 6.041s 2023-01-11T22:12:51.9476599Z 2023-01-11T22:12:51.9476698Z OK 2023-01-11T22:12:51.9476717Z 2023-01-11T22:12:51.9476844Z Generating XML reports... 2023-01-11T22:12:51.9477293Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220547.xml 2023-01-11T22:12:51.9477664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9477825Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9478212Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9478405Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9478425Z 2023-01-11T22:12:51.9478536Z Running tests... 2023-01-11T22:12:51.9478807Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9479122Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9479390Z test_isend_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9479613Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35028 2023-01-11T22:12:51.9479815Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35029 2023-01-11T22:12:51.9480189Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9480372Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9480752Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9480942Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9481308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9481490Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9481867Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9482056Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9482284Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9482533Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9482935Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9483334Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9483567Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9483970Z STAGE:2023-01-11 22:06:00 35029:35029 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9484199Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9484533Z STAGE:2023-01-11 22:06:00 35028:35028 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9484870Z [1673474760.187990] [7e0e28e30a97:35028:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9485093Z [1673474761.242624] [7e0e28e30a97:35028:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9485333Z [1673474761.242624] [7e0e28e30a97:35028:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9485680Z STAGE:2023-01-11 22:06:01 35028:35028 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9485953Z [1673474760.211588] [7e0e28e30a97:35029:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9486188Z [1673474761.257613] [7e0e28e30a97:35029:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9486426Z [1673474761.257613] [7e0e28e30a97:35029:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9486773Z STAGE:2023-01-11 22:06:01 35029:35029 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9487124Z STAGE:2023-01-11 22:06:01 35028:35028 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9487473Z STAGE:2023-01-11 22:06:01 35029:35029 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9487558Z ok (6.044s) 2023-01-11T22:12:51.9487598Z 2023-01-11T22:12:51.9487845Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9487965Z Ran 1 test in 6.044s 2023-01-11T22:12:51.9487985Z 2023-01-11T22:12:51.9488078Z OK 2023-01-11T22:12:51.9488098Z 2023-01-11T22:12:51.9488223Z Generating XML reports... 2023-01-11T22:12:51.9488671Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220556.xml 2023-01-11T22:12:51.9489048Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9489228Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9489606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9489780Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9489817Z 2023-01-11T22:12:51.9489909Z Running tests... 2023-01-11T22:12:51.9490176Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9490495Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9490778Z test_monitored_barrier_allreduce_hang (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9490999Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35142 2023-01-11T22:12:51.9491219Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35143 2023-01-11T22:12:51.9491593Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9491771Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9492129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9492323Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9492692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9493326Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9493812Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9494002Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9494341Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9494593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9494983Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9495376Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9495614Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9495855Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9496080Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9496315Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9496715Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9497106Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9497350Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:12:51.9497571Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:12:51.9497966Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.9498357Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.9498591Z [E ProcessGroupGloo.cpp:138] [Rank 0]: Rank 1 failed to pass monitoredBarrier in 100 ms 2023-01-11T22:12:51.9498698Z ok (21.351s) 2023-01-11T22:12:51.9498722Z 2023-01-11T22:12:51.9498990Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9499106Z Ran 1 test in 21.351s 2023-01-11T22:12:51.9499125Z 2023-01-11T22:12:51.9499219Z OK 2023-01-11T22:12:51.9499238Z 2023-01-11T22:12:51.9499363Z Generating XML reports... 2023-01-11T22:12:51.9499792Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220604.xml 2023-01-11T22:12:51.9500163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9500346Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9500725Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9500920Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9500939Z 2023-01-11T22:12:51.9501054Z Running tests... 2023-01-11T22:12:51.9501323Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9501632Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9501931Z test_monitored_barrier_allreduce_hang_wait_all_ranks (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9502134Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35263 2023-01-11T22:12:51.9502442Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35264 2023-01-11T22:12:51.9502819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9502995Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9503362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9503583Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9503972Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9504164Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9504527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9504719Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9504968Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9505214Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9505612Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9506008Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9506239Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9506479Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9506701Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9506917Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9507314Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9507703Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9507949Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:12:51.9508190Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:12:51.9508576Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.9508959Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.9509193Z [E ProcessGroupGloo.cpp:2803] [Rank 0]: Rank 1 failed to pass monitoredBarrier in 100 ms 2023-01-11T22:12:51.9509427Z [E ProcessGroupGloo.cpp:138] [Rank 0]: Ranks 1 failed to pass monitoredBarrier in 100 ms 2023-01-11T22:12:51.9509513Z ok (21.267s) 2023-01-11T22:12:51.9509534Z 2023-01-11T22:12:51.9509806Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9509920Z Ran 1 test in 21.267s 2023-01-11T22:12:51.9509939Z 2023-01-11T22:12:51.9510034Z OK 2023-01-11T22:12:51.9510053Z 2023-01-11T22:12:51.9510183Z Generating XML reports... 2023-01-11T22:12:51.9510632Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220628.xml 2023-01-11T22:12:51.9511002Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9511183Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9511562Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9511797Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9511816Z 2023-01-11T22:12:51.9511924Z Running tests... 2023-01-11T22:12:51.9512197Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9512506Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9512966Z test_monitored_barrier_failure_order (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.9512989Z 2023-01-11T22:12:51.9513257Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9513373Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9513393Z 2023-01-11T22:12:51.9513504Z OK (skipped=1) 2023-01-11T22:12:51.9513523Z 2023-01-11T22:12:51.9513630Z Generating XML reports... 2023-01-11T22:12:51.9514076Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220652.xml 2023-01-11T22:12:51.9514451Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9514628Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9515008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9515204Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9515223Z 2023-01-11T22:12:51.9515337Z Running tests... 2023-01-11T22:12:51.9515599Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9515910Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9516282Z test_monitored_barrier_gloo (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.9516325Z 2023-01-11T22:12:51.9516568Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9516682Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9516702Z 2023-01-11T22:12:51.9516814Z OK (skipped=1) 2023-01-11T22:12:51.9516833Z 2023-01-11T22:12:51.9516957Z Generating XML reports... 2023-01-11T22:12:51.9517403Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220654.xml 2023-01-11T22:12:51.9517775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9517952Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9518329Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9518501Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9518521Z 2023-01-11T22:12:51.9518638Z Running tests... 2023-01-11T22:12:51.9518903Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9519212Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9519634Z test_monitored_barrier_gloo_rank_0_timeout (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.9519654Z 2023-01-11T22:12:51.9519919Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9520034Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9520053Z 2023-01-11T22:12:51.9520163Z OK (skipped=1) 2023-01-11T22:12:51.9520182Z 2023-01-11T22:12:51.9520307Z Generating XML reports... 2023-01-11T22:12:51.9520727Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220657.xml 2023-01-11T22:12:51.9521095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9521337Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9521718Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9521910Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9521930Z 2023-01-11T22:12:51.9522040Z Running tests... 2023-01-11T22:12:51.9522344Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9522663Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9523052Z test_monitored_barrier_gloo_subgroup (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.9523091Z 2023-01-11T22:12:51.9523333Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9523446Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9523470Z 2023-01-11T22:12:51.9523578Z OK (skipped=1) 2023-01-11T22:12:51.9523597Z 2023-01-11T22:12:51.9523722Z Generating XML reports... 2023-01-11T22:12:51.9524164Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220659.xml 2023-01-11T22:12:51.9524534Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9524714Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9525093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9525266Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9525303Z 2023-01-11T22:12:51.9525394Z Running tests... 2023-01-11T22:12:51.9525659Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9525968Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9526381Z test_monitored_barrier_wait_all_ranks (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.9526400Z 2023-01-11T22:12:51.9526659Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9526775Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9526794Z 2023-01-11T22:12:51.9526905Z OK (skipped=1) 2023-01-11T22:12:51.9526927Z 2023-01-11T22:12:51.9527053Z Generating XML reports... 2023-01-11T22:12:51.9527472Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220702.xml 2023-01-11T22:12:51.9527835Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9528013Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9528391Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9528587Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9528606Z 2023-01-11T22:12:51.9528716Z Running tests... 2023-01-11T22:12:51.9528980Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9529289Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9529694Z test_nccl_backend_bool_allgather (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.002s) 2023-01-11T22:12:51.9529714Z 2023-01-11T22:12:51.9529956Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9530070Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9530090Z 2023-01-11T22:12:51.9530199Z OK (skipped=1) 2023-01-11T22:12:51.9530217Z 2023-01-11T22:12:51.9530339Z Generating XML reports... 2023-01-11T22:12:51.9530884Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220704.xml 2023-01-11T22:12:51.9531251Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9531431Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9531855Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9532050Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9532070Z 2023-01-11T22:12:51.9532162Z Running tests... 2023-01-11T22:12:51.9532427Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9532739Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9533348Z test_nccl_backend_bool_allreduce (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.002s) 2023-01-11T22:12:51.9533376Z 2023-01-11T22:12:51.9533642Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9533758Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9533778Z 2023-01-11T22:12:51.9533909Z OK (skipped=1) 2023-01-11T22:12:51.9533928Z 2023-01-11T22:12:51.9534052Z Generating XML reports... 2023-01-11T22:12:51.9534483Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220706.xml 2023-01-11T22:12:51.9534855Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9535033Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9535414Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9535606Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9535629Z 2023-01-11T22:12:51.9535740Z Running tests... 2023-01-11T22:12:51.9536002Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9536313Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9536712Z test_nccl_backend_bool_broadcast (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.002s) 2023-01-11T22:12:51.9536732Z 2023-01-11T22:12:51.9536979Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9537094Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9537113Z 2023-01-11T22:12:51.9537222Z OK (skipped=1) 2023-01-11T22:12:51.9537241Z 2023-01-11T22:12:51.9537367Z Generating XML reports... 2023-01-11T22:12:51.9537811Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220709.xml 2023-01-11T22:12:51.9538182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9538365Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9538739Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9538931Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9538950Z 2023-01-11T22:12:51.9539045Z Running tests... 2023-01-11T22:12:51.9539309Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9539621Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9540016Z test_nccl_backend_bool_reduce (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'nccl'} (0.003s) 2023-01-11T22:12:51.9540036Z 2023-01-11T22:12:51.9540305Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9540509Z Ran 1 test in 0.003s 2023-01-11T22:12:51.9540528Z 2023-01-11T22:12:51.9540639Z OK (skipped=1) 2023-01-11T22:12:51.9540659Z 2023-01-11T22:12:51.9540784Z Generating XML reports... 2023-01-11T22:12:51.9541234Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220711.xml 2023-01-11T22:12:51.9541652Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9541837Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9542224Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9542419Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9542439Z 2023-01-11T22:12:51.9542548Z Running tests... 2023-01-11T22:12:51.9542806Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9543123Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9543424Z test_nccl_high_priority_stream (__main__.TestDistBackendWithSpawn) ... skip: Only NCCL backend supports high priority stream (0.002s) 2023-01-11T22:12:51.9543445Z 2023-01-11T22:12:51.9543709Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9543804Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9543823Z 2023-01-11T22:12:51.9543938Z OK (skipped=1) 2023-01-11T22:12:51.9543958Z 2023-01-11T22:12:51.9544085Z Generating XML reports... 2023-01-11T22:12:51.9544525Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220714.xml 2023-01-11T22:12:51.9544893Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9545072Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9545454Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9545648Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9545667Z 2023-01-11T22:12:51.9545758Z Running tests... 2023-01-11T22:12:51.9546022Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9546337Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9546592Z test_new_subgroups (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T22:12:51.9546612Z 2023-01-11T22:12:51.9546877Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9546991Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9547011Z 2023-01-11T22:12:51.9547121Z OK (skipped=1) 2023-01-11T22:12:51.9547141Z 2023-01-11T22:12:51.9547267Z Generating XML reports... 2023-01-11T22:12:51.9547714Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220716.xml 2023-01-11T22:12:51.9548063Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9548241Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9548626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9548818Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9548837Z 2023-01-11T22:12:51.9548952Z Running tests... 2023-01-11T22:12:51.9549216Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9549528Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9549800Z test_new_subgroups_by_enumeration (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T22:12:51.9549876Z 2023-01-11T22:12:51.9550147Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9550243Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9550262Z 2023-01-11T22:12:51.9550377Z OK (skipped=1) 2023-01-11T22:12:51.9550396Z 2023-01-11T22:12:51.9550523Z Generating XML reports... 2023-01-11T22:12:51.9551030Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220718.xml 2023-01-11T22:12:51.9551411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9551589Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9551970Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9552161Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9552184Z 2023-01-11T22:12:51.9552296Z Running tests... 2023-01-11T22:12:51.9552538Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9552847Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9553157Z test_new_subgroups_by_enumeration_input_rank_exceeds_world_size (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T22:12:51.9553180Z 2023-01-11T22:12:51.9553442Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9553557Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9553577Z 2023-01-11T22:12:51.9553687Z OK (skipped=1) 2023-01-11T22:12:51.9553706Z 2023-01-11T22:12:51.9553832Z Generating XML reports... 2023-01-11T22:12:51.9554279Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220721.xml 2023-01-11T22:12:51.9554630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9554813Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9555195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9555389Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9555409Z 2023-01-11T22:12:51.9555525Z Running tests... 2023-01-11T22:12:51.9555791Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9556101Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9556405Z test_new_subgroups_by_enumeration_negative_input_rank (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9556632Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35813 2023-01-11T22:12:51.9556837Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35814 2023-01-11T22:12:51.9557205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9557383Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9557764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9557959Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9558331Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9558509Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9558887Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9559059Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9559374Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9559618Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9560024Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9560474Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9560711Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9560942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9561048Z ok (4.262s) 2023-01-11T22:12:51.9561067Z 2023-01-11T22:12:51.9561339Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9561438Z Ran 1 test in 4.262s 2023-01-11T22:12:51.9561457Z 2023-01-11T22:12:51.9561552Z OK 2023-01-11T22:12:51.9561571Z 2023-01-11T22:12:51.9561697Z Generating XML reports... 2023-01-11T22:12:51.9562141Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220723.xml 2023-01-11T22:12:51.9562511Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9562693Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9563070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9563263Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9563283Z 2023-01-11T22:12:51.9563395Z Running tests... 2023-01-11T22:12:51.9563641Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9563954Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9564253Z test_new_subgroups_group_size_exceeds_world_size (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9564474Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35916 2023-01-11T22:12:51.9564693Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35917 2023-01-11T22:12:51.9565066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9565245Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9565622Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9565795Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9566163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9566345Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9566726Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9566918Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9567167Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9567415Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9567819Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9568213Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9568490Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9568717Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9568824Z ok (4.328s) 2023-01-11T22:12:51.9568845Z 2023-01-11T22:12:51.9569118Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9569233Z Ran 1 test in 4.328s 2023-01-11T22:12:51.9569253Z 2023-01-11T22:12:51.9569346Z OK 2023-01-11T22:12:51.9569410Z 2023-01-11T22:12:51.9569542Z Generating XML reports... 2023-01-11T22:12:51.9569994Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220730.xml 2023-01-11T22:12:51.9570369Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9570529Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9570909Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9571107Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9571126Z 2023-01-11T22:12:51.9571238Z Running tests... 2023-01-11T22:12:51.9571506Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9571821Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9572104Z test_new_subgroups_overlap_not_allowed (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T22:12:51.9572124Z 2023-01-11T22:12:51.9572387Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9572482Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9572519Z 2023-01-11T22:12:51.9572611Z OK (skipped=1) 2023-01-11T22:12:51.9572630Z 2023-01-11T22:12:51.9572754Z Generating XML reports... 2023-01-11T22:12:51.9573591Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220737.xml 2023-01-11T22:12:51.9573979Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9574156Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9574535Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9574727Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9574749Z 2023-01-11T22:12:51.9574859Z Running tests... 2023-01-11T22:12:51.9575101Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9575412Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9575715Z test_new_subgroups_world_size_not_divisible_by_group_size (__main__.TestDistBackendWithSpawn) ... skip: Test requires world size of 4 (0.002s) 2023-01-11T22:12:51.9575738Z 2023-01-11T22:12:51.9576003Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9576115Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9576135Z 2023-01-11T22:12:51.9576245Z OK (skipped=1) 2023-01-11T22:12:51.9576264Z 2023-01-11T22:12:51.9576390Z Generating XML reports... 2023-01-11T22:12:51.9576837Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220739.xml 2023-01-11T22:12:51.9577207Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9577365Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9577745Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9577937Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9578045Z 2023-01-11T22:12:51.9578162Z Running tests... 2023-01-11T22:12:51.9578429Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9578740Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9579021Z test_output_unused_in_loss_dict_module (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9579837Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78112 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.636s) 2023-01-11T22:12:51.9579862Z 2023-01-11T22:12:51.9580137Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9580234Z Ran 1 test in 1.636s 2023-01-11T22:12:51.9580273Z 2023-01-11T22:12:51.9580368Z OK (skipped=1) 2023-01-11T22:12:51.9580388Z 2023-01-11T22:12:51.9580515Z Generating XML reports... 2023-01-11T22:12:51.9580964Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220741.xml 2023-01-11T22:12:51.9581336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9581519Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9581900Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9582091Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9582111Z 2023-01-11T22:12:51.9582222Z Running tests... 2023-01-11T22:12:51.9582467Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9582776Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9583061Z test_output_unused_in_loss_tuple_module (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9583284Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36119 2023-01-11T22:12:51.9583507Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36120 2023-01-11T22:12:51.9583882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9584060Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9584443Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9584634Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9584986Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9585168Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9585544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9585735Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9585981Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9586229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9586626Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9587022Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9587232Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9587526Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9587802Z [1673474870.777160] [7e0e28e30a97:36120:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9588041Z [1673474870.783526] [7e0e28e30a97:36120:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9588328Z [1673474870.783526] [7e0e28e30a97:36120:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9588607Z [1673474870.776848] [7e0e28e30a97:36119:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9588838Z [1673474870.782336] [7e0e28e30a97:36119:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9589078Z [1673474870.782336] [7e0e28e30a97:36119:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9589187Z ok (5.936s) 2023-01-11T22:12:51.9589207Z 2023-01-11T22:12:51.9589482Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9589577Z Ran 1 test in 5.937s 2023-01-11T22:12:51.9589596Z 2023-01-11T22:12:51.9589691Z OK 2023-01-11T22:12:51.9589710Z 2023-01-11T22:12:51.9589834Z Generating XML reports... 2023-01-11T22:12:51.9590285Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220746.xml 2023-01-11T22:12:51.9590660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9590839Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9591220Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9591415Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9591435Z 2023-01-11T22:12:51.9591548Z Running tests... 2023-01-11T22:12:51.9591791Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9592106Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9592381Z test_periodic_model_averager (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9592605Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36237 2023-01-11T22:12:51.9592827Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36238 2023-01-11T22:12:51.9593198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9593376Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9593760Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9593931Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9594301Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9594478Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9594857Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9595050Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9595295Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9595541Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9595941Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9596400Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9596611Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9596841Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9597166Z [1673474879.672904] [7e0e28e30a97:36237:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9597446Z [1673474879.675241] [7e0e28e30a97:36238:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9597681Z [1673474879.680560] [7e0e28e30a97:36237:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9597927Z [1673474879.680560] [7e0e28e30a97:36237:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9598153Z [1673474879.680544] [7e0e28e30a97:36238:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9598382Z [1673474879.680544] [7e0e28e30a97:36238:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9598490Z ok (5.938s) 2023-01-11T22:12:51.9598510Z 2023-01-11T22:12:51.9598785Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9598881Z Ran 1 test in 5.938s 2023-01-11T22:12:51.9598900Z 2023-01-11T22:12:51.9598995Z OK 2023-01-11T22:12:51.9599015Z 2023-01-11T22:12:51.9599139Z Generating XML reports... 2023-01-11T22:12:51.9599584Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220754.xml 2023-01-11T22:12:51.9599959Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9600138Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9600516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9600710Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9600733Z 2023-01-11T22:12:51.9600825Z Running tests... 2023-01-11T22:12:51.9601095Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9601407Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9601694Z test_periodic_model_averager_param_group (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9601914Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36352 2023-01-11T22:12:51.9602140Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36353 2023-01-11T22:12:51.9602513Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9602689Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9603069Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9603243Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9603610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9603788Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9604160Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9604404Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9604653Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9604898Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9605302Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9605721Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9605963Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9606190Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9606468Z [1673474888.220383] [7e0e28e30a97:36352:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9606745Z [1673474888.224402] [7e0e28e30a97:36353:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9606981Z [1673474888.226969] [7e0e28e30a97:36352:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9607226Z [1673474888.226969] [7e0e28e30a97:36352:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9607450Z [1673474888.229420] [7e0e28e30a97:36353:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9607680Z [1673474888.229420] [7e0e28e30a97:36353:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9607786Z ok (6.033s) 2023-01-11T22:12:51.9607806Z 2023-01-11T22:12:51.9608062Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9608179Z Ran 1 test in 6.034s 2023-01-11T22:12:51.9608199Z 2023-01-11T22:12:51.9608295Z OK 2023-01-11T22:12:51.9608314Z 2023-01-11T22:12:51.9608439Z Generating XML reports... 2023-01-11T22:12:51.9608883Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220803.xml 2023-01-11T22:12:51.9609259Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9609434Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9609814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9610005Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9610024Z 2023-01-11T22:12:51.9610117Z Running tests... 2023-01-11T22:12:51.9610383Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9610701Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9610983Z test_post_localSGD_optimizer_parity (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9611729Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77123 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.614s) 2023-01-11T22:12:51.9611750Z 2023-01-11T22:12:51.9612014Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9612128Z Ran 1 test in 1.614s 2023-01-11T22:12:51.9612148Z 2023-01-11T22:12:51.9612259Z OK (skipped=1) 2023-01-11T22:12:51.9612278Z 2023-01-11T22:12:51.9612406Z Generating XML reports... 2023-01-11T22:12:51.9612834Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220811.xml 2023-01-11T22:12:51.9613629Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9613809Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9614192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9614472Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9614494Z 2023-01-11T22:12:51.9614608Z Running tests... 2023-01-11T22:12:51.9614876Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9615188Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9615482Z test_post_localSGD_optimizer_parity_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9616214Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/77292 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.617s) 2023-01-11T22:12:51.9616252Z 2023-01-11T22:12:51.9616498Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9616615Z Ran 1 test in 1.617s 2023-01-11T22:12:51.9616634Z 2023-01-11T22:12:51.9616744Z OK (skipped=1) 2023-01-11T22:12:51.9616763Z 2023-01-11T22:12:51.9616890Z Generating XML reports... 2023-01-11T22:12:51.9617333Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220815.xml 2023-01-11T22:12:51.9617701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9617879Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9618264Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9618457Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9618477Z 2023-01-11T22:12:51.9618568Z Running tests... 2023-01-11T22:12:51.9618829Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9619144Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9619455Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9619677Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36535 2023-01-11T22:12:51.9619894Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36536 2023-01-11T22:12:51.9620261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9620441Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9620798Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9620990Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9621359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9621538Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9621916Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9622109Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9622356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9622666Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9623074Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9623451Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9623730Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9623966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9624118Z skip: Need at least 4 CUDA devices (4.243s) 2023-01-11T22:12:51.9624138Z 2023-01-11T22:12:51.9624411Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9624525Z Ran 1 test in 4.243s 2023-01-11T22:12:51.9624546Z 2023-01-11T22:12:51.9624657Z OK (skipped=1) 2023-01-11T22:12:51.9624679Z 2023-01-11T22:12:51.9624807Z Generating XML reports... 2023-01-11T22:12:51.9625235Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220819.xml 2023-01-11T22:12:51.9625603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9625782Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9626162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9626355Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9626374Z 2023-01-11T22:12:51.9626485Z Running tests... 2023-01-11T22:12:51.9626748Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9627062Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9627392Z test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9627593Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36638 2023-01-11T22:12:51.9627813Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36639 2023-01-11T22:12:51.9628188Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9628365Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9628744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9628935Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9629301Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9629480Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9629858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9630030Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9630276Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9630524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9630925Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9631320Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9631552Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9631841Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9631995Z skip: Need at least 4 CUDA devices (4.225s) 2023-01-11T22:12:51.9632015Z 2023-01-11T22:12:51.9632285Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9632381Z Ran 1 test in 4.225s 2023-01-11T22:12:51.9632400Z 2023-01-11T22:12:51.9632512Z OK (skipped=1) 2023-01-11T22:12:51.9632531Z 2023-01-11T22:12:51.9632704Z Generating XML reports... 2023-01-11T22:12:51.9633160Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220826.xml 2023-01-11T22:12:51.9633530Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9633708Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9634085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9634281Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9634301Z 2023-01-11T22:12:51.9634418Z Running tests... 2023-01-11T22:12:51.9634686Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9634999Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9635291Z test_post_localSGD_optimizer_step_reload (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9636035Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/84886 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.631s) 2023-01-11T22:12:51.9636056Z 2023-01-11T22:12:51.9636321Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9636441Z Ran 1 test in 1.632s 2023-01-11T22:12:51.9636461Z 2023-01-11T22:12:51.9636572Z OK (skipped=1) 2023-01-11T22:12:51.9636592Z 2023-01-11T22:12:51.9636719Z Generating XML reports... 2023-01-11T22:12:51.9637164Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220833.xml 2023-01-11T22:12:51.9637520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9637701Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9638081Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9638276Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9638295Z 2023-01-11T22:12:51.9638407Z Running tests... 2023-01-11T22:12:51.9638672Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9638989Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9639256Z test_reduce_full_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9639459Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36775 2023-01-11T22:12:51.9639681Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36776 2023-01-11T22:12:51.9640053Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9640230Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9640611Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9640804Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9641245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9641424Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9641801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9641973Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9642266Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9642521Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9642927Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9643327Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9643563Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9643805Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9644026Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9644262Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9644638Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9644974Z STAGE:2023-01-11 22:08:41 36776:36776 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9645370Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9645706Z STAGE:2023-01-11 22:08:41 36775:36775 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9645990Z [1673474921.447146] [7e0e28e30a97:36776:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9646228Z [1673474922.494479] [7e0e28e30a97:36776:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9646472Z [1673474922.494479] [7e0e28e30a97:36776:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9646749Z [1673474921.445612] [7e0e28e30a97:36775:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9646980Z [1673474922.494503] [7e0e28e30a97:36775:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9647220Z [1673474922.494503] [7e0e28e30a97:36775:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9647759Z STAGE:2023-01-11 22:08:42 36776:36776 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:08:42 36775:36775 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9647799Z 2023-01-11T22:12:51.9648351Z STAGE:2023-01-11 22:08:42 36775:36775 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 22:08:42 36776:36776 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9648392Z 2023-01-11T22:12:51.9648704Z STAGE:2023-01-11 22:08:42 36776:36776 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9649025Z STAGE:2023-01-11 22:08:42 36775:36775 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9649360Z STAGE:2023-01-11 22:08:42 36776:36776 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9649689Z STAGE:2023-01-11 22:08:42 36775:36775 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9650101Z STAGE:2023-01-11 22:08:42 36776:36776 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9650447Z STAGE:2023-01-11 22:08:42 36775:36775 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9650553Z ok (5.838s) 2023-01-11T22:12:51.9650572Z 2023-01-11T22:12:51.9650885Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9650988Z Ran 1 test in 5.838s 2023-01-11T22:12:51.9651024Z 2023-01-11T22:12:51.9651100Z OK 2023-01-11T22:12:51.9651119Z 2023-01-11T22:12:51.9651245Z Generating XML reports... 2023-01-11T22:12:51.9651699Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220837.xml 2023-01-11T22:12:51.9652073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9652257Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9652636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9652831Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9652851Z 2023-01-11T22:12:51.9653128Z Running tests... 2023-01-11T22:12:51.9653389Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9653702Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9653971Z test_reduce_full_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9654194Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36889 2023-01-11T22:12:51.9654413Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36890 2023-01-11T22:12:51.9654792Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9654972Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9655356Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9655596Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9656033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9656210Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9656589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9656782Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9657031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9657360Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9657766Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9658165Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9658381Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9658622Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9658847Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9659080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9659475Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9659912Z STAGE:2023-01-11 22:08:49 36890:36890 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9660304Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9660700Z STAGE:2023-01-11 22:08:49 36889:36889 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9660988Z [1673474929.789164] [7e0e28e30a97:36889:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9661204Z [1673474930.852033] [7e0e28e30a97:36889:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9661449Z [1673474930.852033] [7e0e28e30a97:36889:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9661757Z [1673474929.789498] [7e0e28e30a97:36890:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9661995Z [1673474930.837888] [7e0e28e30a97:36890:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9662238Z [1673474930.837888] [7e0e28e30a97:36890:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9662803Z STAGE:2023-01-11 22:08:51 36889:36889 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:08:51 36890:36890 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9662826Z 2023-01-11T22:12:51.9663176Z STAGE:2023-01-11 22:08:51 36890:36890 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9663528Z STAGE:2023-01-11 22:08:51 36889:36889 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9663859Z STAGE:2023-01-11 22:08:51 36890:36890 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9664182Z STAGE:2023-01-11 22:08:51 36889:36889 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9664728Z STAGE:2023-01-11 22:08:51 36890:36890 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:08:51 36889:36889 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9664749Z 2023-01-11T22:12:51.9665313Z STAGE:2023-01-11 22:08:51 36889:36889 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 22:08:51 36890:36890 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9665334Z 2023-01-11T22:12:51.9665420Z ok (5.832s) 2023-01-11T22:12:51.9665459Z 2023-01-11T22:12:51.9665707Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9665829Z Ran 1 test in 5.832s 2023-01-11T22:12:51.9665849Z 2023-01-11T22:12:51.9665943Z OK 2023-01-11T22:12:51.9665963Z 2023-01-11T22:12:51.9666089Z Generating XML reports... 2023-01-11T22:12:51.9666539Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220845.xml 2023-01-11T22:12:51.9666913Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9667096Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9667478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9667654Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9667674Z 2023-01-11T22:12:51.9667786Z Running tests... 2023-01-11T22:12:51.9668054Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9668369Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9668710Z test_reduce_full_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9668933Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37003 2023-01-11T22:12:51.9669152Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37004 2023-01-11T22:12:51.9669612Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9669778Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9670163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9670355Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9670728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9670912Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9671289Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9671481Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9671733Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9671981Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9672362Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9672759Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9672992Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9673239Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9673467Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9673704Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9674102Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9674494Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9674831Z STAGE:2023-01-11 22:08:58 37003:37003 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9675138Z STAGE:2023-01-11 22:08:58 37004:37004 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9675423Z [1673474938.252905] [7e0e28e30a97:37004:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9675662Z [1673474939.268293] [7e0e28e30a97:37004:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9675907Z [1673474939.268293] [7e0e28e30a97:37004:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9676187Z [1673474938.232147] [7e0e28e30a97:37003:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9676421Z [1673474939.277634] [7e0e28e30a97:37003:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9676656Z [1673474939.277634] [7e0e28e30a97:37003:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9677213Z STAGE:2023-01-11 22:08:59 37004:37004 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:08:59 37003:37003 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9677296Z 2023-01-11T22:12:51.9677662Z STAGE:2023-01-11 22:08:59 37004:37004 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9678014Z STAGE:2023-01-11 22:08:59 37003:37003 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9678394Z STAGE:2023-01-11 22:08:59 37004:37004 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9678712Z STAGE:2023-01-11 22:08:59 37003:37003 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9679045Z STAGE:2023-01-11 22:08:59 37004:37004 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9679375Z STAGE:2023-01-11 22:08:59 37003:37003 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9679724Z STAGE:2023-01-11 22:08:59 37004:37004 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9680067Z STAGE:2023-01-11 22:08:59 37003:37003 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9680173Z ok (5.951s) 2023-01-11T22:12:51.9680193Z 2023-01-11T22:12:51.9680462Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9680576Z Ran 1 test in 5.951s 2023-01-11T22:12:51.9680600Z 2023-01-11T22:12:51.9680676Z OK 2023-01-11T22:12:51.9680715Z 2023-01-11T22:12:51.9680823Z Generating XML reports... 2023-01-11T22:12:51.9681274Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220854.xml 2023-01-11T22:12:51.9681649Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9681832Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9682216Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9682411Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9682431Z 2023-01-11T22:12:51.9682543Z Running tests... 2023-01-11T22:12:51.9682811Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9683109Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9683378Z test_reduce_full_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9683603Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37117 2023-01-11T22:12:51.9683822Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37118 2023-01-11T22:12:51.9684195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9684377Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9684758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9684952Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9685322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9685479Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9685857Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9686051Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9686296Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9686542Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9687010Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9687405Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9687685Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9687916Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9688144Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9688382Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9688786Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9689179Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9689517Z STAGE:2023-01-11 22:09:06 37118:37118 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9689841Z STAGE:2023-01-11 22:09:06 37117:37117 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9690124Z [1673474946.643177] [7e0e28e30a97:37117:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9690363Z [1673474947.687668] [7e0e28e30a97:37117:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9690609Z [1673474947.687668] [7e0e28e30a97:37117:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9690864Z [1673474946.664756] [7e0e28e30a97:37118:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9691097Z [1673474947.695026] [7e0e28e30a97:37118:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9691336Z [1673474947.695026] [7e0e28e30a97:37118:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9691903Z STAGE:2023-01-11 22:09:08 37117:37117 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:09:08 37118:37118 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9691925Z 2023-01-11T22:12:51.9692277Z STAGE:2023-01-11 22:09:08 37117:37117 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9692627Z STAGE:2023-01-11 22:09:08 37118:37118 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9693287Z STAGE:2023-01-11 22:09:08 37118:37118 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9693717Z STAGE:2023-01-11 22:09:08 37117:37117 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9694048Z STAGE:2023-01-11 22:09:08 37118:37118 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9694372Z STAGE:2023-01-11 22:09:08 37117:37117 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9694702Z STAGE:2023-01-11 22:09:08 37118:37118 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9695044Z STAGE:2023-01-11 22:09:08 37117:37117 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9695151Z ok (5.838s) 2023-01-11T22:12:51.9695171Z 2023-01-11T22:12:51.9695438Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9695555Z Ran 1 test in 5.838s 2023-01-11T22:12:51.9695574Z 2023-01-11T22:12:51.9695669Z OK 2023-01-11T22:12:51.9695781Z 2023-01-11T22:12:51.9695916Z Generating XML reports... 2023-01-11T22:12:51.9696370Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220902.xml 2023-01-11T22:12:51.9696741Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9696901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9697342Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9697545Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9697565Z 2023-01-11T22:12:51.9697677Z Running tests... 2023-01-11T22:12:51.9697948Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9698261Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9698526Z test_reduce_group_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9698748Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37231 2023-01-11T22:12:51.9698949Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37232 2023-01-11T22:12:51.9699321Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9699502Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9699880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9700070Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9700438Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9700620Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9701003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9701195Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9701424Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9701674Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9702074Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9702468Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9702702Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9702931Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9703096Z skip: Skipped due to small world size. (4.114s) 2023-01-11T22:12:51.9703117Z 2023-01-11T22:12:51.9703386Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9703481Z Ran 1 test in 4.114s 2023-01-11T22:12:51.9703520Z 2023-01-11T22:12:51.9703612Z OK (skipped=1) 2023-01-11T22:12:51.9703630Z 2023-01-11T22:12:51.9703757Z Generating XML reports... 2023-01-11T22:12:51.9704210Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220911.xml 2023-01-11T22:12:51.9704581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9704759Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9705136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9705390Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9705410Z 2023-01-11T22:12:51.9705523Z Running tests... 2023-01-11T22:12:51.9705771Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9706081Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9706385Z test_reduce_group_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9706613Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37334 2023-01-11T22:12:51.9706833Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37335 2023-01-11T22:12:51.9707208Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9707385Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9707762Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9707959Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9708307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9708484Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9708865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9709056Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9709302Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9709545Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9709929Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9710330Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9710562Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9710790Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9710954Z skip: Skipped due to small world size. (4.204s) 2023-01-11T22:12:51.9710974Z 2023-01-11T22:12:51.9711241Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9711356Z Ran 1 test in 4.204s 2023-01-11T22:12:51.9711375Z 2023-01-11T22:12:51.9711484Z OK (skipped=1) 2023-01-11T22:12:51.9711503Z 2023-01-11T22:12:51.9711610Z Generating XML reports... 2023-01-11T22:12:51.9712055Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220917.xml 2023-01-11T22:12:51.9712427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9712605Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9712981Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9713177Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9713196Z 2023-01-11T22:12:51.9713306Z Running tests... 2023-01-11T22:12:51.9713571Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9713880Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9714127Z test_reduce_group_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9714347Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37437 2023-01-11T22:12:51.9714630Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37438 2023-01-11T22:12:51.9715006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9715183Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9715602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9715800Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9716175Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9716333Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9716710Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9716905Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9717150Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9717395Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9717792Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9718190Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9718420Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9718644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9718786Z skip: Skipped due to small world size. (4.232s) 2023-01-11T22:12:51.9718806Z 2023-01-11T22:12:51.9719080Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9719194Z Ran 1 test in 4.232s 2023-01-11T22:12:51.9719214Z 2023-01-11T22:12:51.9719323Z OK (skipped=1) 2023-01-11T22:12:51.9719343Z 2023-01-11T22:12:51.9719467Z Generating XML reports... 2023-01-11T22:12:51.9719914Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220924.xml 2023-01-11T22:12:51.9720288Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9720466Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9720839Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9721012Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9721031Z 2023-01-11T22:12:51.9721141Z Running tests... 2023-01-11T22:12:51.9721407Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9721719Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9721975Z test_reduce_group_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9722195Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37540 2023-01-11T22:12:51.9722416Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37541 2023-01-11T22:12:51.9722785Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9722941Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9723315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9723503Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9723937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9724114Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9724490Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9724679Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9724975Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9725227Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9725613Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9726007Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9726240Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9726468Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9726628Z skip: Skipped due to small world size. (4.237s) 2023-01-11T22:12:51.9726647Z 2023-01-11T22:12:51.9726912Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9727030Z Ran 1 test in 4.237s 2023-01-11T22:12:51.9727049Z 2023-01-11T22:12:51.9727160Z OK (skipped=1) 2023-01-11T22:12:51.9727179Z 2023-01-11T22:12:51.9727304Z Generating XML reports... 2023-01-11T22:12:51.9727732Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220931.xml 2023-01-11T22:12:51.9728104Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9728284Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9728664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9728857Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9728876Z 2023-01-11T22:12:51.9728987Z Running tests... 2023-01-11T22:12:51.9729251Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9729562Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9729790Z test_reduce_max (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9730010Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37643 2023-01-11T22:12:51.9730227Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37644 2023-01-11T22:12:51.9730601Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9730780Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9731157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9731348Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9731717Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9731875Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9732251Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9732440Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9732682Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9733132Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9733542Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9733940Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9734245Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9734598Z STAGE:2023-01-11 22:09:41 37643:37643 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9734809Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9735164Z STAGE:2023-01-11 22:09:41 37644:37644 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9735444Z [1673474981.955085] [7e0e28e30a97:37644:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9735685Z [1673474982.975509] [7e0e28e30a97:37644:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9735927Z [1673474982.975509] [7e0e28e30a97:37644:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9736203Z [1673474981.932445] [7e0e28e30a97:37643:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9736437Z [1673474982.981231] [7e0e28e30a97:37643:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9736674Z [1673474982.981231] [7e0e28e30a97:37643:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9737229Z STAGE:2023-01-11 22:09:43 37644:37644 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:09:43 37643:37643 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9737254Z 2023-01-11T22:12:51.9737608Z STAGE:2023-01-11 22:09:43 37644:37644 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9737956Z STAGE:2023-01-11 22:09:43 37643:37643 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9738269Z STAGE:2023-01-11 22:09:43 37643:37643 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9738589Z STAGE:2023-01-11 22:09:43 37644:37644 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9738924Z STAGE:2023-01-11 22:09:43 37643:37643 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9739250Z STAGE:2023-01-11 22:09:43 37644:37644 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9739593Z STAGE:2023-01-11 22:09:43 37643:37643 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9739940Z STAGE:2023-01-11 22:09:43 37644:37644 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9740043Z ok (5.842s) 2023-01-11T22:12:51.9740064Z 2023-01-11T22:12:51.9740328Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9740443Z Ran 1 test in 5.842s 2023-01-11T22:12:51.9740463Z 2023-01-11T22:12:51.9740538Z OK 2023-01-11T22:12:51.9740560Z 2023-01-11T22:12:51.9740687Z Generating XML reports... 2023-01-11T22:12:51.9741139Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220938.xml 2023-01-11T22:12:51.9741507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9741687Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9742065Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9742342Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9742362Z 2023-01-11T22:12:51.9742474Z Running tests... 2023-01-11T22:12:51.9742727Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9743044Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9743355Z test_reduce_min (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9743584Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37757 2023-01-11T22:12:51.9743805Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37758 2023-01-11T22:12:51.9744186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9744363Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9744747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9744938Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9745286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9745466Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9745845Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9746035Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9746282Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9746530Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9746939Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9747336Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9747568Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9747887Z STAGE:2023-01-11 22:09:50 37758:37758 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9748114Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9748449Z STAGE:2023-01-11 22:09:50 37757:37757 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9748726Z [1673474990.276956] [7e0e28e30a97:37757:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9748967Z [1673474991.322096] [7e0e28e30a97:37757:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9749207Z [1673474991.322096] [7e0e28e30a97:37757:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9749482Z [1673474990.299013] [7e0e28e30a97:37758:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9749716Z [1673474991.322891] [7e0e28e30a97:37758:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9749955Z [1673474991.322891] [7e0e28e30a97:37758:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9750507Z STAGE:2023-01-11 22:09:51 37757:37757 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:09:51 37758:37758 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9750585Z 2023-01-11T22:12:51.9750931Z STAGE:2023-01-11 22:09:51 37758:37758 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9751280Z STAGE:2023-01-11 22:09:51 37757:37757 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9751608Z STAGE:2023-01-11 22:09:51 37758:37758 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9751974Z STAGE:2023-01-11 22:09:51 37757:37757 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9752322Z STAGE:2023-01-11 22:09:51 37758:37758 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9752876Z STAGE:2023-01-11 22:09:51 37757:37757 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:09:51 37758:37758 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9752896Z 2023-01-11T22:12:51.9753241Z STAGE:2023-01-11 22:09:51 37757:37757 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9753350Z ok (5.836s) 2023-01-11T22:12:51.9753370Z 2023-01-11T22:12:51.9753636Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9753750Z Ran 1 test in 5.837s 2023-01-11T22:12:51.9753770Z 2023-01-11T22:12:51.9753846Z OK 2023-01-11T22:12:51.9753865Z 2023-01-11T22:12:51.9753993Z Generating XML reports... 2023-01-11T22:12:51.9754451Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220946.xml 2023-01-11T22:12:51.9754822Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9755001Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9755383Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9755580Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9755599Z 2023-01-11T22:12:51.9755712Z Running tests... 2023-01-11T22:12:51.9755957Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9756273Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9756554Z test_reduce_multigpu (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl backend supports reduce multigpu (0.002s) 2023-01-11T22:12:51.9756574Z 2023-01-11T22:12:51.9756839Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9756953Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9756972Z 2023-01-11T22:12:51.9757081Z OK (skipped=1) 2023-01-11T22:12:51.9757100Z 2023-01-11T22:12:51.9757226Z Generating XML reports... 2023-01-11T22:12:51.9757673Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220954.xml 2023-01-11T22:12:51.9758049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9758208Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9758586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9758779Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9758802Z 2023-01-11T22:12:51.9758917Z Running tests... 2023-01-11T22:12:51.9759186Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9759502Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9759757Z test_reduce_product (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9759977Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37904 2023-01-11T22:12:51.9760261Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37905 2023-01-11T22:12:51.9760622Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9760799Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9761181Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9761428Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9761808Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9761986Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9762363Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9762557Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9762788Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9763032Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9763436Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9763835Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9764066Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9764296Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9764633Z STAGE:2023-01-11 22:10:00 37904:37904 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9764963Z STAGE:2023-01-11 22:10:00 37905:37905 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9765243Z [1673475000.948024] [7e0e28e30a97:37904:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9765460Z [1673475002.018263] [7e0e28e30a97:37904:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9765705Z [1673475002.018263] [7e0e28e30a97:37904:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9765983Z [1673475000.948048] [7e0e28e30a97:37905:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9766212Z [1673475002.010983] [7e0e28e30a97:37905:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9766450Z [1673475002.010983] [7e0e28e30a97:37905:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9767007Z STAGE:2023-01-11 22:10:02 37904:37904 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:10:02 37905:37905 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9767028Z 2023-01-11T22:12:51.9767376Z STAGE:2023-01-11 22:10:02 37905:37905 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9767725Z STAGE:2023-01-11 22:10:02 37904:37904 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9768050Z STAGE:2023-01-11 22:10:02 37904:37904 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9768369Z STAGE:2023-01-11 22:10:02 37905:37905 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9768700Z STAGE:2023-01-11 22:10:02 37904:37904 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9769100Z STAGE:2023-01-11 22:10:02 37904:37904 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9769433Z STAGE:2023-01-11 22:10:02 37905:37905 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9769778Z STAGE:2023-01-11 22:10:02 37905:37905 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9769883Z ok (5.730s) 2023-01-11T22:12:51.9769903Z 2023-01-11T22:12:51.9770214Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9770338Z Ran 1 test in 5.730s 2023-01-11T22:12:51.9770358Z 2023-01-11T22:12:51.9770453Z OK 2023-01-11T22:12:51.9770471Z 2023-01-11T22:12:51.9770597Z Generating XML reports... 2023-01-11T22:12:51.9771036Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220957.xml 2023-01-11T22:12:51.9771410Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9771592Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9771969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9772162Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9772182Z 2023-01-11T22:12:51.9772292Z Running tests... 2023-01-11T22:12:51.9772559Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9773083Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9773543Z test_reduce_scatter_tensor_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA reduce_scatter_tensor (0.002s) 2023-01-11T22:12:51.9773565Z 2023-01-11T22:12:51.9773824Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9773938Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9773963Z 2023-01-11T22:12:51.9774072Z OK (skipped=1) 2023-01-11T22:12:51.9774091Z 2023-01-11T22:12:51.9774217Z Generating XML reports... 2023-01-11T22:12:51.9774666Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221005.xml 2023-01-11T22:12:51.9775038Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9775220Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9775598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9775790Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9775810Z 2023-01-11T22:12:51.9775902Z Running tests... 2023-01-11T22:12:51.9776164Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9776477Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9776755Z test_reduce_scatter_v_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports reduce_scatter_v (0.003s) 2023-01-11T22:12:51.9776774Z 2023-01-11T22:12:51.9777036Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9777150Z Ran 1 test in 0.003s 2023-01-11T22:12:51.9777169Z 2023-01-11T22:12:51.9777280Z OK (skipped=1) 2023-01-11T22:12:51.9777300Z 2023-01-11T22:12:51.9777429Z Generating XML reports... 2023-01-11T22:12:51.9777874Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221007.xml 2023-01-11T22:12:51.9778227Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9778406Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9778784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9779071Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9779092Z 2023-01-11T22:12:51.9779202Z Running tests... 2023-01-11T22:12:51.9779474Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9779786Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9780092Z test_reduce_sum (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9780303Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38084 2023-01-11T22:12:51.9780522Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38085 2023-01-11T22:12:51.9780899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9781076Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9781462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9781653Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9782022Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9782203Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9782582Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9782755Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9783001Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9783244Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9783644Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9784039Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9784270Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9784612Z STAGE:2023-01-11 22:10:14 38084:38084 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9784841Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9785155Z STAGE:2023-01-11 22:10:14 38085:38085 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9785431Z [1673475014.168401] [7e0e28e30a97:38084:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9785670Z [1673475015.221436] [7e0e28e30a97:38084:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9785910Z [1673475015.221436] [7e0e28e30a97:38084:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9786185Z [1673475014.192137] [7e0e28e30a97:38085:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9786418Z [1673475015.246769] [7e0e28e30a97:38085:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9786657Z [1673475015.246769] [7e0e28e30a97:38085:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9787214Z STAGE:2023-01-11 22:10:15 38084:38084 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:10:15 38085:38085 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9787289Z 2023-01-11T22:12:51.9787870Z STAGE:2023-01-11 22:10:15 38084:38084 ActivityProfilerController.cpp:310] Completed Stage: Post ProcessingSTAGE:2023-01-11 22:10:15 38085:38085 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9787891Z 2023-01-11T22:12:51.9788218Z STAGE:2023-01-11 22:10:15 38085:38085 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9788582Z STAGE:2023-01-11 22:10:15 38084:38084 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9788927Z STAGE:2023-01-11 22:10:15 38085:38085 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9789234Z STAGE:2023-01-11 22:10:15 38084:38084 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9789579Z STAGE:2023-01-11 22:10:15 38085:38085 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9789922Z STAGE:2023-01-11 22:10:15 38084:38084 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9790030Z ok (5.939s) 2023-01-11T22:12:51.9790050Z 2023-01-11T22:12:51.9790317Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9790432Z Ran 1 test in 5.939s 2023-01-11T22:12:51.9790452Z 2023-01-11T22:12:51.9790544Z OK 2023-01-11T22:12:51.9790564Z 2023-01-11T22:12:51.9790688Z Generating XML reports... 2023-01-11T22:12:51.9791139Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221010.xml 2023-01-11T22:12:51.9791496Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9791675Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9792051Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9792248Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9792267Z 2023-01-11T22:12:51.9792378Z Running tests... 2023-01-11T22:12:51.9792641Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9792951Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9793211Z test_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA reduce (0.002s) 2023-01-11T22:12:51.9793233Z 2023-01-11T22:12:51.9793485Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9793598Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9793617Z 2023-01-11T22:12:51.9793727Z OK (skipped=1) 2023-01-11T22:12:51.9793746Z 2023-01-11T22:12:51.9793871Z Generating XML reports... 2023-01-11T22:12:51.9794317Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221018.xml 2023-01-11T22:12:51.9794690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9794866Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9795245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9795437Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9795456Z 2023-01-11T22:12:51.9795551Z Running tests... 2023-01-11T22:12:51.9795813Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9796123Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9796392Z test_reduce_sum_cuda_twice (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA reduce (0.002s) 2023-01-11T22:12:51.9796411Z 2023-01-11T22:12:51.9796671Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9796847Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9796866Z 2023-01-11T22:12:51.9796975Z OK (skipped=1) 2023-01-11T22:12:51.9796994Z 2023-01-11T22:12:51.9797118Z Generating XML reports... 2023-01-11T22:12:51.9797570Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221021.xml 2023-01-11T22:12:51.9797970Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9798153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9798538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9798729Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9798749Z 2023-01-11T22:12:51.9798858Z Running tests... 2023-01-11T22:12:51.9799120Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9799435Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9799690Z test_reduce_sum_twice (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9799893Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38264 2023-01-11T22:12:51.9800113Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38265 2023-01-11T22:12:51.9800485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9800659Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9801036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9801226Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9801591Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9801771Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9802149Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9802321Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9802568Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9802815Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9803212Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9803608Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9803841Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9804178Z STAGE:2023-01-11 22:10:27 38265:38265 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9804404Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9804734Z STAGE:2023-01-11 22:10:27 38264:38264 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9804997Z [1673475027.357032] [7e0e28e30a97:38265:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9805233Z [1673475028.384189] [7e0e28e30a97:38265:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9805472Z [1673475028.384189] [7e0e28e30a97:38265:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9805745Z [1673475027.335109] [7e0e28e30a97:38264:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9806038Z [1673475028.374879] [7e0e28e30a97:38264:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9806276Z [1673475028.374879] [7e0e28e30a97:38264:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9806886Z STAGE:2023-01-11 22:10:28 38265:38265 ActivityProfilerController.cpp:306] Completed Stage: CollectionSTAGE:2023-01-11 22:10:28 38264:38264 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9806909Z 2023-01-11T22:12:51.9807272Z STAGE:2023-01-11 22:10:28 38265:38265 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9807616Z STAGE:2023-01-11 22:10:28 38264:38264 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9807945Z STAGE:2023-01-11 22:10:28 38264:38264 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9808245Z STAGE:2023-01-11 22:10:28 38265:38265 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9808577Z STAGE:2023-01-11 22:10:28 38264:38264 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9808918Z STAGE:2023-01-11 22:10:28 38264:38264 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9809252Z STAGE:2023-01-11 22:10:28 38265:38265 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9809593Z STAGE:2023-01-11 22:10:28 38265:38265 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9809695Z ok (5.958s) 2023-01-11T22:12:51.9809714Z 2023-01-11T22:12:51.9809980Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9810094Z Ran 1 test in 5.958s 2023-01-11T22:12:51.9810113Z 2023-01-11T22:12:51.9810210Z OK 2023-01-11T22:12:51.9810229Z 2023-01-11T22:12:51.9810337Z Generating XML reports... 2023-01-11T22:12:51.9810786Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221023.xml 2023-01-11T22:12:51.9811159Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9811337Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9811717Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9811910Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9811930Z 2023-01-11T22:12:51.9812039Z Running tests... 2023-01-11T22:12:51.9812300Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9812595Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9812996Z test_scatter (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:12:51.9813019Z 2023-01-11T22:12:51.9813293Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9813408Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9813427Z 2023-01-11T22:12:51.9813537Z OK (skipped=1) 2023-01-11T22:12:51.9813556Z 2023-01-11T22:12:51.9813684Z Generating XML reports... 2023-01-11T22:12:51.9814130Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221031.xml 2023-01-11T22:12:51.9814504Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9814680Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9815037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9815354Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9815374Z 2023-01-11T22:12:51.9815486Z Running tests... 2023-01-11T22:12:51.9815752Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9816063Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9816394Z test_scatter_checks (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:12:51.9816416Z 2023-01-11T22:12:51.9816692Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9816804Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9816823Z 2023-01-11T22:12:51.9816930Z OK (skipped=1) 2023-01-11T22:12:51.9816950Z 2023-01-11T22:12:51.9817057Z Generating XML reports... 2023-01-11T22:12:51.9817501Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221034.xml 2023-01-11T22:12:51.9817876Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9818051Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9818427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9818621Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9818642Z 2023-01-11T22:12:51.9818749Z Running tests... 2023-01-11T22:12:51.9819010Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9819322Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9819576Z test_scatter_complex (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:12:51.9819596Z 2023-01-11T22:12:51.9819855Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9819972Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9819990Z 2023-01-11T22:12:51.9820098Z OK (skipped=1) 2023-01-11T22:12:51.9820117Z 2023-01-11T22:12:51.9820241Z Generating XML reports... 2023-01-11T22:12:51.9820683Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221036.xml 2023-01-11T22:12:51.9821056Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9821232Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9821592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9821783Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9821802Z 2023-01-11T22:12:51.9821912Z Running tests... 2023-01-11T22:12:51.9822176Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9822495Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9822750Z test_scatter_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA gather (0.002s) 2023-01-11T22:12:51.9822770Z 2023-01-11T22:12:51.9823032Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9823142Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9823164Z 2023-01-11T22:12:51.9823274Z OK (skipped=1) 2023-01-11T22:12:51.9823294Z 2023-01-11T22:12:51.9823400Z Generating XML reports... 2023-01-11T22:12:51.9823841Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221039.xml 2023-01-11T22:12:51.9824209Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9824382Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9824830Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9825025Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9825044Z 2023-01-11T22:12:51.9825154Z Running tests... 2023-01-11T22:12:51.9825417Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9825770Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9826025Z test_scatter_cuda_complex (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl supports CUDA gather (0.002s) 2023-01-11T22:12:51.9826045Z 2023-01-11T22:12:51.9826312Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9826424Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9826443Z 2023-01-11T22:12:51.9826550Z OK (skipped=1) 2023-01-11T22:12:51.9826569Z 2023-01-11T22:12:51.9826699Z Generating XML reports... 2023-01-11T22:12:51.9827145Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221041.xml 2023-01-11T22:12:51.9827514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9827690Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9828072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9828248Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9828267Z 2023-01-11T22:12:51.9828376Z Running tests... 2023-01-11T22:12:51.9828636Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9828952Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9829218Z test_scatter_full_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:12:51.9829241Z 2023-01-11T22:12:51.9829502Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9829614Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9829633Z 2023-01-11T22:12:51.9829742Z OK (skipped=1) 2023-01-11T22:12:51.9829761Z 2023-01-11T22:12:51.9829868Z Generating XML reports... 2023-01-11T22:12:51.9830308Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221043.xml 2023-01-11T22:12:51.9841180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9841405Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9841847Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9842045Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9842074Z 2023-01-11T22:12:51.9842187Z Running tests... 2023-01-11T22:12:51.9842462Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9842778Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9843047Z test_scatter_group (__main__.TestDistBackendWithSpawn) ... skip: CPU tensor ops not supported by UCP TL (0.002s) 2023-01-11T22:12:51.9843072Z 2023-01-11T22:12:51.9843334Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9843451Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9843471Z 2023-01-11T22:12:51.9843564Z OK (skipped=1) 2023-01-11T22:12:51.9843599Z 2023-01-11T22:12:51.9843707Z Generating XML reports... 2023-01-11T22:12:51.9844158Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221046.xml 2023-01-11T22:12:51.9844531Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9844823Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9845214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9845409Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9845429Z 2023-01-11T22:12:51.9845593Z Running tests... 2023-01-11T22:12:51.9845869Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9846164Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9846549Z test_scatter_object_list (__main__.TestDistBackendWithSpawn) ... skip: Test requires backend to be one of {'gloo'} (0.002s) 2023-01-11T22:12:51.9846568Z 2023-01-11T22:12:51.9846824Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9846942Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9846962Z 2023-01-11T22:12:51.9847070Z OK (skipped=1) 2023-01-11T22:12:51.9847089Z 2023-01-11T22:12:51.9847214Z Generating XML reports... 2023-01-11T22:12:51.9847656Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221048.xml 2023-01-11T22:12:51.9848031Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9848208Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9848570Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9848761Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9848781Z 2023-01-11T22:12:51.9848889Z Running tests... 2023-01-11T22:12:51.9849151Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9849464Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9849708Z test_send_recv (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9849929Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38642 2023-01-11T22:12:51.9850148Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38643 2023-01-11T22:12:51.9850506Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9850684Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9851066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9851263Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9851629Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9851806Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9852182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9852371Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9852617Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9852848Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9853451Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9853846Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9854186Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9854415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9854703Z [1673475054.978089] [7e0e28e30a97:38642:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9855126Z [1673475055.751932] [7e0e28e30a97:38642:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9855385Z [1673475055.751932] [7e0e28e30a97:38642:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9855658Z [1673475055.001082] [7e0e28e30a97:38643:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9855872Z [1673475055.776346] [7e0e28e30a97:38643:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9856117Z [1673475055.776346] [7e0e28e30a97:38643:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9856217Z ok (5.438s) 2023-01-11T22:12:51.9856238Z 2023-01-11T22:12:51.9856522Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9856635Z Ran 1 test in 5.438s 2023-01-11T22:12:51.9856655Z 2023-01-11T22:12:51.9856749Z OK 2023-01-11T22:12:51.9856771Z 2023-01-11T22:12:51.9856896Z Generating XML reports... 2023-01-11T22:12:51.9857348Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221051.xml 2023-01-11T22:12:51.9857721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9857881Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9858261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9858457Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9858476Z 2023-01-11T22:12:51.9858673Z Running tests... 2023-01-11T22:12:51.9858943Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9859254Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9859542Z test_send_recv_any_source (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support send/recv from any source (0.002s) 2023-01-11T22:12:51.9859563Z 2023-01-11T22:12:51.9859829Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9859942Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9859962Z 2023-01-11T22:12:51.9860053Z OK (skipped=1) 2023-01-11T22:12:51.9860072Z 2023-01-11T22:12:51.9860196Z Generating XML reports... 2023-01-11T22:12:51.9860639Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221059.xml 2023-01-11T22:12:51.9861013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9861190Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9861564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9861759Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9861779Z 2023-01-11T22:12:51.9861889Z Running tests... 2023-01-11T22:12:51.9862133Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9862447Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9862757Z test_send_recv_any_source_autograd_profiler (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support send/recv from any source (0.002s) 2023-01-11T22:12:51.9862844Z 2023-01-11T22:12:51.9863116Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9863231Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9863251Z 2023-01-11T22:12:51.9863359Z OK (skipped=1) 2023-01-11T22:12:51.9863378Z 2023-01-11T22:12:51.9863501Z Generating XML reports... 2023-01-11T22:12:51.9863996Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221101.xml 2023-01-11T22:12:51.9864383Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9864541Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9864918Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9865107Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9865132Z 2023-01-11T22:12:51.9865240Z Running tests... 2023-01-11T22:12:51.9865505Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9865815Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9866115Z test_send_recv_any_source_torch_profiler (__main__.TestDistBackendWithSpawn) ... skip: ucc does not support send/recv from any source (0.002s) 2023-01-11T22:12:51.9866135Z 2023-01-11T22:12:51.9866397Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9866510Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9866530Z 2023-01-11T22:12:51.9866621Z OK (skipped=1) 2023-01-11T22:12:51.9866656Z 2023-01-11T22:12:51.9866763Z Generating XML reports... 2023-01-11T22:12:51.9867208Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221103.xml 2023-01-11T22:12:51.9867577Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9867757Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9868134Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9868325Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9868346Z 2023-01-11T22:12:51.9868457Z Running tests... 2023-01-11T22:12:51.9868720Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9869012Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9869287Z test_send_recv_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9869509Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38851 2023-01-11T22:12:51.9869726Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38852 2023-01-11T22:12:51.9870099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9870274Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9870652Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9870845Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9871194Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9871369Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9871747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9871937Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9872252Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9872499Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9872908Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9873354Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9873591Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9873802Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9874145Z STAGE:2023-01-11 22:11:10 38852:38852 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9874470Z STAGE:2023-01-11 22:11:10 38851:38851 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9874755Z [1673475070.037089] [7e0e28e30a97:38852:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9874990Z [1673475071.103696] [7e0e28e30a97:38852:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9875234Z [1673475071.103696] [7e0e28e30a97:38852:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9875574Z STAGE:2023-01-11 22:11:11 38852:38852 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9875922Z STAGE:2023-01-11 22:11:11 38852:38852 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9876193Z [1673475070.037121] [7e0e28e30a97:38851:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9876430Z [1673475071.103325] [7e0e28e30a97:38851:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9876651Z [1673475071.103325] [7e0e28e30a97:38851:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9876988Z STAGE:2023-01-11 22:11:11 38851:38851 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9877337Z STAGE:2023-01-11 22:11:11 38851:38851 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9877440Z ok (5.748s) 2023-01-11T22:12:51.9877461Z 2023-01-11T22:12:51.9877728Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9877841Z Ran 1 test in 5.748s 2023-01-11T22:12:51.9877861Z 2023-01-11T22:12:51.9877953Z OK 2023-01-11T22:12:51.9877973Z 2023-01-11T22:12:51.9878097Z Generating XML reports... 2023-01-11T22:12:51.9878543Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221106.xml 2023-01-11T22:12:51.9878905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9879085Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9879471Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9879669Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9879689Z 2023-01-11T22:12:51.9879799Z Running tests... 2023-01-11T22:12:51.9880063Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9880372Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9880613Z test_send_recv_nccl (__main__.TestDistBackendWithSpawn) ... skip: NCCL Send Recv Only (0.002s) 2023-01-11T22:12:51.9880691Z 2023-01-11T22:12:51.9880951Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9881066Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9881086Z 2023-01-11T22:12:51.9881194Z OK (skipped=1) 2023-01-11T22:12:51.9881213Z 2023-01-11T22:12:51.9881341Z Generating XML reports... 2023-01-11T22:12:51.9881784Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221114.xml 2023-01-11T22:12:51.9882221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9882407Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9882791Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9882985Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9883005Z 2023-01-11T22:12:51.9883097Z Running tests... 2023-01-11T22:12:51.9883364Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9883673Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9883938Z test_send_recv_nccl_autograd_profiler (__main__.TestDistBackendWithSpawn) ... skip: NCCL Send Recv Only (0.002s) 2023-01-11T22:12:51.9883959Z 2023-01-11T22:12:51.9884222Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9884339Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9884359Z 2023-01-11T22:12:51.9884468Z OK (skipped=1) 2023-01-11T22:12:51.9884487Z 2023-01-11T22:12:51.9884612Z Generating XML reports... 2023-01-11T22:12:51.9885059Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221116.xml 2023-01-11T22:12:51.9885412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9885592Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9885971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9886162Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9886182Z 2023-01-11T22:12:51.9886290Z Running tests... 2023-01-11T22:12:51.9886553Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9886866Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9887126Z test_send_recv_nccl_torch_profiler (__main__.TestDistBackendWithSpawn) ... skip: NCCL Send Recv Only (0.002s) 2023-01-11T22:12:51.9887146Z 2023-01-11T22:12:51.9887404Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9887499Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9887518Z 2023-01-11T22:12:51.9887625Z OK (skipped=1) 2023-01-11T22:12:51.9887648Z 2023-01-11T22:12:51.9887775Z Generating XML reports... 2023-01-11T22:12:51.9888217Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221119.xml 2023-01-11T22:12:51.9888586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9888763Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9889141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9889332Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9889352Z 2023-01-11T22:12:51.9889444Z Running tests... 2023-01-11T22:12:51.9889706Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9890016Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9890346Z test_send_recv_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9890570Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39064 2023-01-11T22:12:51.9890789Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39065 2023-01-11T22:12:51.9891165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9891392Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9891766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9891957Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9892327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9892505Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9893044Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9893246Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9893494Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9893744Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9894150Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9894528Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9894754Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9895093Z STAGE:2023-01-11 22:11:25 39065:39065 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9895326Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9895659Z STAGE:2023-01-11 22:11:25 39064:39064 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9895940Z [1673475085.701165] [7e0e28e30a97:39065:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9896173Z [1673475086.750976] [7e0e28e30a97:39065:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9896415Z [1673475086.750976] [7e0e28e30a97:39065:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9896753Z STAGE:2023-01-11 22:11:27 39065:39065 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9897100Z STAGE:2023-01-11 22:11:27 39065:39065 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9897363Z [1673475085.680496] [7e0e28e30a97:39064:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9897592Z [1673475086.740316] [7e0e28e30a97:39064:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9897833Z [1673475086.740316] [7e0e28e30a97:39064:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9898173Z STAGE:2023-01-11 22:11:27 39064:39064 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9898516Z STAGE:2023-01-11 22:11:27 39064:39064 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9898620Z ok (5.955s) 2023-01-11T22:12:51.9898639Z 2023-01-11T22:12:51.9898906Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9899108Z Ran 1 test in 5.955s 2023-01-11T22:12:51.9899128Z 2023-01-11T22:12:51.9899220Z OK 2023-01-11T22:12:51.9899239Z 2023-01-11T22:12:51.9899348Z Generating XML reports... 2023-01-11T22:12:51.9899803Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221121.xml 2023-01-11T22:12:51.9900237Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9900421Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9900805Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9900998Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9901018Z 2023-01-11T22:12:51.9901127Z Running tests... 2023-01-11T22:12:51.9901393Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9901694Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9901955Z test_send_recv_with_tag (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9902175Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39178 2023-01-11T22:12:51.9902390Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39179 2023-01-11T22:12:51.9902766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9902941Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9903319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9903514Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9903882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9904044Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9904419Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9904609Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9904857Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9905102Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9905501Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9905895Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9906130Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9906360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9906619Z [1673475094.205976] [7e0e28e30a97:39179:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9906857Z [1673475094.989445] [7e0e28e30a97:39179:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9907097Z [1673475094.989445] [7e0e28e30a97:39179:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9907373Z [1673475094.185350] [7e0e28e30a97:39178:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9907603Z [1673475094.973008] [7e0e28e30a97:39178:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9907902Z [1673475094.973008] [7e0e28e30a97:39178:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9908006Z ok (5.546s) 2023-01-11T22:12:51.9908025Z 2023-01-11T22:12:51.9908303Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9908418Z Ran 1 test in 5.547s 2023-01-11T22:12:51.9908438Z 2023-01-11T22:12:51.9908514Z OK 2023-01-11T22:12:51.9908597Z 2023-01-11T22:12:51.9908711Z Generating XML reports... 2023-01-11T22:12:51.9909169Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221130.xml 2023-01-11T22:12:51.9909538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9909715Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9910092Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9910290Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9910310Z 2023-01-11T22:12:51.9910420Z Running tests... 2023-01-11T22:12:51.9910684Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9910981Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9911270Z test_send_recv_with_tag_autograd_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9911491Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39288 2023-01-11T22:12:51.9911707Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39289 2023-01-11T22:12:51.9912078Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9912258Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9912635Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9912827Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9913177Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9913356Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9913734Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9913924Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9914169Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9914412Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9914819Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9915217Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9915447Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9915769Z STAGE:2023-01-11 22:11:42 39288:39288 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9915997Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9916331Z STAGE:2023-01-11 22:11:42 39289:39289 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9916609Z [1673475102.276275] [7e0e28e30a97:39288:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9916909Z [1673475103.307160] [7e0e28e30a97:39288:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9917149Z [1673475103.307160] [7e0e28e30a97:39288:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9917494Z STAGE:2023-01-11 22:11:43 39288:39288 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9917815Z [1673475102.296674] [7e0e28e30a97:39289:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9918054Z [1673475103.328992] [7e0e28e30a97:39289:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9918289Z [1673475103.328992] [7e0e28e30a97:39289:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9918618Z STAGE:2023-01-11 22:11:43 39289:39289 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9918970Z STAGE:2023-01-11 22:11:43 39288:39288 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9919313Z STAGE:2023-01-11 22:11:43 39289:39289 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9919415Z ok (5.937s) 2023-01-11T22:12:51.9919435Z 2023-01-11T22:12:51.9919703Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9919816Z Ran 1 test in 5.937s 2023-01-11T22:12:51.9919837Z 2023-01-11T22:12:51.9919930Z OK 2023-01-11T22:12:51.9919949Z 2023-01-11T22:12:51.9920073Z Generating XML reports... 2023-01-11T22:12:51.9920503Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221138.xml 2023-01-11T22:12:51.9920872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9921054Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9921435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9921629Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9921649Z 2023-01-11T22:12:51.9921757Z Running tests... 2023-01-11T22:12:51.9922027Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9922339Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9922617Z test_send_recv_with_tag_torch_profiler (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9922821Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39402 2023-01-11T22:12:51.9923039Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39403 2023-01-11T22:12:51.9923414Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9923592Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9923969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9924159Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9924531Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9924707Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9925084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9925256Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9925500Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9925804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9926208Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9926601Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9926874Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9927227Z STAGE:2023-01-11 22:11:50 39402:39402 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9927455Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9927769Z STAGE:2023-01-11 22:11:50 39403:39403 ActivityProfilerController.cpp:300] Completed Stage: Warm Up 2023-01-11T22:12:51.9928049Z [1673475110.806285] [7e0e28e30a97:39402:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9928285Z [1673475111.855507] [7e0e28e30a97:39402:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9928523Z [1673475111.855507] [7e0e28e30a97:39402:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9928867Z STAGE:2023-01-11 22:11:52 39402:39402 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9929139Z [1673475110.827214] [7e0e28e30a97:39403:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9929367Z [1673475111.854754] [7e0e28e30a97:39403:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9929603Z [1673475111.854754] [7e0e28e30a97:39403:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9929950Z STAGE:2023-01-11 22:11:52 39403:39403 ActivityProfilerController.cpp:306] Completed Stage: Collection 2023-01-11T22:12:51.9930301Z STAGE:2023-01-11 22:11:52 39402:39402 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9930632Z STAGE:2023-01-11 22:11:52 39403:39403 ActivityProfilerController.cpp:310] Completed Stage: Post Processing 2023-01-11T22:12:51.9930741Z ok (5.942s) 2023-01-11T22:12:51.9930761Z 2023-01-11T22:12:51.9931028Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9931142Z Ran 1 test in 5.942s 2023-01-11T22:12:51.9931162Z 2023-01-11T22:12:51.9931254Z OK 2023-01-11T22:12:51.9931273Z 2023-01-11T22:12:51.9931398Z Generating XML reports... 2023-01-11T22:12:51.9931849Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221146.xml 2023-01-11T22:12:51.9932225Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9932401Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9932763Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9933154Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9933182Z 2023-01-11T22:12:51.9933297Z Running tests... 2023-01-11T22:12:51.9933566Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9933878Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9934159Z test_sparse_all_reduce_sum (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo backend support sparse all reduce (0.002s) 2023-01-11T22:12:51.9934179Z 2023-01-11T22:12:51.9934441Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9934643Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9934662Z 2023-01-11T22:12:51.9934772Z OK (skipped=1) 2023-01-11T22:12:51.9934792Z 2023-01-11T22:12:51.9934899Z Generating XML reports... 2023-01-11T22:12:51.9935349Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221155.xml 2023-01-11T22:12:51.9935782Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9935991Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9936378Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9936569Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9936589Z 2023-01-11T22:12:51.9936698Z Running tests... 2023-01-11T22:12:51.9936965Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9937262Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9937553Z test_sparse_all_reduce_sum_cuda (__main__.TestDistBackendWithSpawn) ... skip: Only Gloo backend support sparse all reduce (0.002s) 2023-01-11T22:12:51.9937573Z 2023-01-11T22:12:51.9937835Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9937947Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9937970Z 2023-01-11T22:12:51.9938079Z OK (skipped=1) 2023-01-11T22:12:51.9938098Z 2023-01-11T22:12:51.9938221Z Generating XML reports... 2023-01-11T22:12:51.9938668Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221157.xml 2023-01-11T22:12:51.9939039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9939216Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9939581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9939776Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9939795Z 2023-01-11T22:12:51.9939904Z Running tests... 2023-01-11T22:12:51.9940170Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9940485Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9940753Z test_stateless_api_with_ddp (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9940973Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39582 2023-01-11T22:12:51.9941190Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39583 2023-01-11T22:12:51.9941562Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9941725Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9942106Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9942298Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9942668Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9942844Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9943222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9943414Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9943661Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9943951Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9944356Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9944748Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9945038Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9945271Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9945546Z [1673475124.860832] [7e0e28e30a97:39583:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9945779Z [1673475124.866264] [7e0e28e30a97:39583:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9946022Z [1673475124.866264] [7e0e28e30a97:39583:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9946294Z [1673475124.852671] [7e0e28e30a97:39582:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9946523Z [1673475124.859557] [7e0e28e30a97:39582:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9946746Z [1673475124.859557] [7e0e28e30a97:39582:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9946851Z ok (5.955s) 2023-01-11T22:12:51.9946870Z 2023-01-11T22:12:51.9947150Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9947266Z Ran 1 test in 5.956s 2023-01-11T22:12:51.9947286Z 2023-01-11T22:12:51.9947378Z OK 2023-01-11T22:12:51.9947398Z 2023-01-11T22:12:51.9947520Z Generating XML reports... 2023-01-11T22:12:51.9947974Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221200.xml 2023-01-11T22:12:51.9948347Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9948524Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9948893Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9949087Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9949107Z 2023-01-11T22:12:51.9949217Z Running tests... 2023-01-11T22:12:51.9949483Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9949793Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9950057Z test_static_graph_api_cpu (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9950278Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39700 2023-01-11T22:12:51.9950497Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39701 2023-01-11T22:12:51.9950852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9951033Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9951416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9951605Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9951973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9952148Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9952596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9952790Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9953033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9953260Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9953708Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9954120Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9954351Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9954579Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9954858Z [1673475132.561233] [7e0e28e30a97:39700:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9955093Z [1673475133.352379] [7e0e28e30a97:39700:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9955333Z [1673475133.352379] [7e0e28e30a97:39700:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9955605Z [1673475132.585407] [7e0e28e30a97:39701:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9955816Z [1673475133.354371] [7e0e28e30a97:39701:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9956055Z [1673475133.354371] [7e0e28e30a97:39701:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9956161Z ok (5.429s) 2023-01-11T22:12:51.9956181Z 2023-01-11T22:12:51.9956459Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9956570Z Ran 1 test in 5.429s 2023-01-11T22:12:51.9956590Z 2023-01-11T22:12:51.9956683Z OK 2023-01-11T22:12:51.9956702Z 2023-01-11T22:12:51.9956826Z Generating XML reports... 2023-01-11T22:12:51.9957274Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221208.xml 2023-01-11T22:12:51.9957645Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9957806Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9958183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9958376Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9958399Z 2023-01-11T22:12:51.9958507Z Running tests... 2023-01-11T22:12:51.9958775Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9959085Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9959389Z test_sync_bn_logged (__main__.TestDistBackendWithSpawn) ... skip: Only Nccl & Gloo backend support DistributedDataParallel (0.002s) 2023-01-11T22:12:51.9959409Z 2023-01-11T22:12:51.9959676Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9959789Z Ran 1 test in 0.002s 2023-01-11T22:12:51.9959809Z 2023-01-11T22:12:51.9959902Z OK (skipped=1) 2023-01-11T22:12:51.9959921Z 2023-01-11T22:12:51.9960045Z Generating XML reports... 2023-01-11T22:12:51.9960488Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221216.xml 2023-01-11T22:12:51.9960859Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9961103Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9961488Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9961678Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9961699Z 2023-01-11T22:12:51.9961809Z Running tests... 2023-01-11T22:12:51.9962129Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9962492Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9962784Z test_undefined_grad_parity_unused_parameters (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9963005Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39847 2023-01-11T22:12:51.9963225Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39848 2023-01-11T22:12:51.9963603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9963783Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9964163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9964357Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9964710Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9964885Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9965265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9965455Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9965706Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9965950Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9966348Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9966746Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9966975Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9967185Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9967462Z [1673475143.548075] [7e0e28e30a97:39847:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9967698Z [1673475143.553421] [7e0e28e30a97:39847:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9967938Z [1673475143.553421] [7e0e28e30a97:39847:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9968720Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:12:51.9968993Z [1673475143.553068] [7e0e28e30a97:39848:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9969298Z [1673475143.559033] [7e0e28e30a97:39848:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9969536Z [1673475143.559033] [7e0e28e30a97:39848:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9970348Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:12:51.9970456Z ok (5.852s) 2023-01-11T22:12:51.9970476Z 2023-01-11T22:12:51.9970752Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9970871Z Ran 1 test in 5.852s 2023-01-11T22:12:51.9970891Z 2023-01-11T22:12:51.9970984Z OK 2023-01-11T22:12:51.9971003Z 2023-01-11T22:12:51.9971110Z Generating XML reports... 2023-01-11T22:12:51.9971560Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221218.xml 2023-01-11T22:12:51.9971937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9972119Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9972498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9972689Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9972708Z 2023-01-11T22:12:51.9972817Z Running tests... 2023-01-11T22:12:51.9973275Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9973601Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9973870Z test_verify_model_across_rank_with_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9974090Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39965 2023-01-11T22:12:51.9974311Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39966 2023-01-11T22:12:51.9974684Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9974859Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9975234Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9975425Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9975794Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9975954Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9976327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9976517Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9976764Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9977012Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9977412Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9977806Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9978124Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9978355Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9978580Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9978870Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9979278Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9979669Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9979910Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:12:51.9980149Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:12:51.9980542Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.9980929Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.9981213Z [1673475152.095982] [7e0e28e30a97:39966:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9981445Z [1673475152.101796] [7e0e28e30a97:39966:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9981669Z [1673475152.101796] [7e0e28e30a97:39966:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9982047Z [1673475157.498842] [7e0e28e30a97:39966:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x2637ab80 was not matched 2023-01-11T22:12:51.9982320Z [1673475152.087125] [7e0e28e30a97:39965:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9982546Z [1673475152.092508] [7e0e28e30a97:39965:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9982782Z [1673475152.092508] [7e0e28e30a97:39965:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9983096Z [1673475157.462287] [7e0e28e30a97:39965:1] ucc_schedule.h:189 UCC WARN timeout 5 sec. has expired on req 0x2335a2c0, seq_num 5, TL_UCP, team_id 1, size 2, rank 0, ctx_rank 0: Barrier n/a inplace=0 bytes=0 2023-01-11T22:12:51.9983366Z [1673475157.508923] [7e0e28e30a97:39965:0] mpool.c:55 UCX WARN object 0x2336b740 {flags:0x20040 recv length 0 host memory} was not returned to mpool ucp_requests 2023-01-11T22:12:51.9983471Z ok (10.542s) 2023-01-11T22:12:51.9983495Z 2023-01-11T22:12:51.9983768Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9983882Z Ran 1 test in 10.542s 2023-01-11T22:12:51.9983902Z 2023-01-11T22:12:51.9983977Z OK 2023-01-11T22:12:51.9983996Z 2023-01-11T22:12:51.9984121Z Generating XML reports... 2023-01-11T22:12:51.9984572Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221227.xml 2023-01-11T22:12:51.9984945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9985124Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9985504Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9985696Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9985716Z 2023-01-11T22:12:51.9985882Z Running tests... 2023-01-11T22:12:51.9986136Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9986448Z Test results will be stored in test-reports/dist-ucc/distributed.test_distributed_spawn 2023-01-11T22:12:51.9986734Z test_verify_model_across_rank_without_logger (__main__.TestDistBackendWithSpawn) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:12:51.9986955Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40085 2023-01-11T22:12:51.9987222Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40086 2023-01-11T22:12:51.9987609Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9987785Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9988165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9988360Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9988711Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:12:51.9988885Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:12:51.9989261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:12:51.9989460Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:12:51.9989707Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:12:51.9989952Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:12:51.9990352Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9990748Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:12:51.9990979Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:12:51.9991189Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:12:51.9991425Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:12:51.9991667Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:12:51.9992061Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9992449Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:12:51.9992690Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:12:51.9992934Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:12:51.9993319Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.9993705Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:12:51.9993970Z [1673475165.159672] [7e0e28e30a97:40085:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9994205Z [1673475165.166178] [7e0e28e30a97:40085:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9994444Z [1673475165.166178] [7e0e28e30a97:40085:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9994760Z [1673475170.527951] [7e0e28e30a97:40085:1] ucc_schedule.h:189 UCC WARN timeout 5 sec. has expired on req 0x240cfa40, seq_num 5, TL_UCP, team_id 1, size 2, rank 0, ctx_rank 0: Barrier n/a inplace=0 bytes=0 2023-01-11T22:12:51.9995092Z [1673475170.564324] [7e0e28e30a97:40085:0] mpool.c:55 UCX WARN object 0x241e0ec0 {flags:0x20040 recv length 0 host memory} was not returned to mpool ucp_requests 2023-01-11T22:12:51.9995405Z [1673475165.161719] [7e0e28e30a97:40086:0] ec_cuda.c:343 cuda ec WARN CUDA cooperative groups are not supported. Fall back to non cooperative launch. 2023-01-11T22:12:51.9995639Z [1673475165.167944] [7e0e28e30a97:40086:0] parser.c:1993 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-01-11T22:12:51.9995877Z [1673475165.167944] [7e0e28e30a97:40086:0] parser.c:1993 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-01-11T22:12:51.9996269Z [1673475170.574301] [7e0e28e30a97:40086:0] tag_match.c:62 UCX WARN unexpected tag-receive descriptor 0x278adf00 was not matched 2023-01-11T22:12:51.9996376Z ok (10.633s) 2023-01-11T22:12:51.9996397Z 2023-01-11T22:12:51.9996662Z ---------------------------------------------------------------------- 2023-01-11T22:12:51.9996760Z Ran 1 test in 10.633s 2023-01-11T22:12:51.9996778Z 2023-01-11T22:12:51.9996871Z OK 2023-01-11T22:12:51.9996891Z 2023-01-11T22:12:51.9997017Z Generating XML reports... 2023-01-11T22:12:51.9997467Z Generated XML report: test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221240.xml 2023-01-11T22:12:51.9997487Z 2023-01-11T22:12:51.9997938Z ##[endgroup] 2023-01-11T22:12:51.9998403Z FINISHED PRINTING LOG FILE of distributed/test_distributed_spawn (/var/lib/jenkins/workspace/test/test-reports/distributed-test_distributed_spawn_gmgwclcw) 2023-01-11T22:12:51.9998424Z 2023-01-11T22:12:51.9998689Z Running distributed/rpc/test_faulty_agent ... [2023-01-11 22:12:51.773125] 2023-01-11T22:12:51.9999190Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/rpc/test_faulty_agent.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:12:51.773470] 2023-01-11T22:12:54.3671529Z 2023-01-11T22:12:54.3672164Z Expand the folded group to see the log file of distributed/rpc/test_faulty_agent 2023-01-11T22:12:54.3673117Z ##[group]PRINTING LOG FILE of distributed/rpc/test_faulty_agent (/var/lib/jenkins/workspace/test/test-reports/distributed-rpc-test_faulty_agent_lavm5qzp) 2023-01-11T22:12:54.3673745Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdbitbwv2 2023-01-11T22:12:54.3674283Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdbitbwv2/_remote_module_non_scriptable.py 2023-01-11T22:12:54.3674601Z 2023-01-11T22:12:54.3674907Z ##[endgroup] 2023-01-11T22:12:54.3675621Z FINISHED PRINTING LOG FILE of distributed/rpc/test_faulty_agent (/var/lib/jenkins/workspace/test/test-reports/distributed-rpc-test_faulty_agent_lavm5qzp) 2023-01-11T22:12:54.3675970Z 2023-01-11T22:12:54.3676280Z Running distributed/pipeline/sync/test_stream ... [2023-01-11 22:12:54.367218] 2023-01-11T22:12:54.3676872Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_stream.py', '-v'] ... [2023-01-11 22:12:54.367460] 2023-01-11T22:13:00.2725543Z 2023-01-11T22:13:00.2726391Z Expand the folded group to see the log file of distributed/pipeline/sync/test_stream 2023-01-11T22:13:00.2728538Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_stream (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_stream_ocx4qlpl) 2023-01-11T22:13:00.2729174Z ============================= test session starts ============================== 2023-01-11T22:13:00.2730343Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:13:00.2731017Z cachedir: .pytest_cache 2023-01-11T22:13:00.2732210Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:13:00.2733180Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:13:00.2733525Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:13:00.2734117Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:13:00.2734503Z collecting ... collected 19 items 2023-01-11T22:13:00.2737090Z Running 19 items in this shard: test/distributed/pipeline/sync/test_stream.py::TestNewStream::test_new_stream_cpu, test/distributed/pipeline/sync/test_stream.py::TestNewStream::test_new_stream_cuda, test/distributed/pipeline/sync/test_stream.py::TestCurrentStream::test_current_stream_cpu, test/distributed/pipeline/sync/test_stream.py::TestCurrentStream::test_current_stream_cuda, test/distributed/pipeline/sync/test_stream.py::TestDefaultStream::test_default_stream_cpu, test/distributed/pipeline/sync/test_stream.py::TestDefaultStream::test_default_stream_cuda, test/distributed/pipeline/sync/test_stream.py::TestUseDevice::test_use_device_cpu, test/distributed/pipeline/sync/test_stream.py::TestUseDevice::test_use_device_cuda, test/distributed/pipeline/sync/test_stream.py::TestUseStream::test_use_stream_cpu, test/distributed/pipeline/sync/test_stream.py::TestUseStream::test_use_stream_cuda, test/distributed/pipeline/sync/test_stream.py::TestGetDevice::test_get_device_cpu, test/distributed/pipeline/sync/test_stream.py::TestGetDevice::test_get_device_cuda, test/distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cpu_cpu, test/distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cpu_cuda, test/distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cuda_cpu, test/distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cuda_cuda, test/distributed/pipeline/sync/test_stream.py::TestRecordStream::test_record_stream_cpu, test/distributed/pipeline/sync/test_stream.py::TestRecordStream::test_record_stream_cuda, test/distributed/pipeline/sync/test_stream.py::TestRecordStream::test_record_stream_shifted_view 2023-01-11T22:13:00.2739511Z 2023-01-11T22:13:00.2739756Z distributed/pipeline/sync/test_stream.py::TestNewStream::test_new_stream_cpu PASSED [ 5%] 2023-01-11T22:13:00.2740217Z distributed/pipeline/sync/test_stream.py::TestNewStream::test_new_stream_cuda PASSED [ 10%] 2023-01-11T22:13:00.2740707Z distributed/pipeline/sync/test_stream.py::TestCurrentStream::test_current_stream_cpu PASSED [ 15%] 2023-01-11T22:13:00.2741212Z distributed/pipeline/sync/test_stream.py::TestCurrentStream::test_current_stream_cuda PASSED [ 21%] 2023-01-11T22:13:00.2741715Z distributed/pipeline/sync/test_stream.py::TestDefaultStream::test_default_stream_cpu PASSED [ 26%] 2023-01-11T22:13:00.2742194Z distributed/pipeline/sync/test_stream.py::TestDefaultStream::test_default_stream_cuda PASSED [ 31%] 2023-01-11T22:13:00.2742677Z distributed/pipeline/sync/test_stream.py::TestUseDevice::test_use_device_cpu PASSED [ 36%] 2023-01-11T22:13:00.2743148Z distributed/pipeline/sync/test_stream.py::TestUseDevice::test_use_device_cuda PASSED [ 42%] 2023-01-11T22:13:00.2743622Z distributed/pipeline/sync/test_stream.py::TestUseStream::test_use_stream_cpu PASSED [ 47%] 2023-01-11T22:13:00.2744072Z distributed/pipeline/sync/test_stream.py::TestUseStream::test_use_stream_cuda PASSED [ 52%] 2023-01-11T22:13:00.2744542Z distributed/pipeline/sync/test_stream.py::TestGetDevice::test_get_device_cpu PASSED [ 57%] 2023-01-11T22:13:00.2745010Z distributed/pipeline/sync/test_stream.py::TestGetDevice::test_get_device_cuda PASSED [ 63%] 2023-01-11T22:13:00.2745479Z distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cpu_cpu PASSED [ 68%] 2023-01-11T22:13:00.2745973Z distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cpu_cuda PASSED [ 73%] 2023-01-11T22:13:00.2746460Z distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cuda_cpu PASSED [ 78%] 2023-01-11T22:13:00.2746947Z distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cuda_cuda PASSED [ 84%] 2023-01-11T22:13:00.2747549Z distributed/pipeline/sync/test_stream.py::TestRecordStream::test_record_stream_cpu PASSED [ 89%] 2023-01-11T22:13:00.2748042Z distributed/pipeline/sync/test_stream.py::TestRecordStream::test_record_stream_cuda PASSED [ 94%] 2023-01-11T22:13:00.2748552Z distributed/pipeline/sync/test_stream.py::TestRecordStream::test_record_stream_shifted_view PASSED [100%] 2023-01-11T22:13:00.2748837Z 2023-01-11T22:13:00.2749048Z ============================== 19 passed in 2.87s ============================== 2023-01-11T22:13:00.2749254Z 2023-01-11T22:13:00.2749569Z ##[endgroup] 2023-01-11T22:13:00.2750221Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_stream (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_stream_ocx4qlpl) 2023-01-11T22:13:00.2750604Z 2023-01-11T22:13:00.2750887Z Running distributed/pipeline/sync/test_phony ... [2023-01-11 22:13:00.272622] 2023-01-11T22:13:00.2751476Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_phony.py', '-v'] ... [2023-01-11 22:13:00.272861] 2023-01-11T22:13:02.6604046Z 2023-01-11T22:13:02.6604596Z Expand the folded group to see the log file of distributed/pipeline/sync/test_phony 2023-01-11T22:13:02.6605858Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_phony (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_phony_rxnojwij) 2023-01-11T22:13:02.6606601Z ============================= test session starts ============================== 2023-01-11T22:13:02.6607463Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:13:02.6607862Z cachedir: .pytest_cache 2023-01-11T22:13:02.6608746Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:13:02.6609311Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:13:02.6609691Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:13:02.6610462Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:13:02.6610865Z collecting ... collected 4 items 2023-01-11T22:13:02.6611546Z Running 4 items in this shard: test/distributed/pipeline/sync/test_phony.py::test_phony_size, test/distributed/pipeline/sync/test_phony.py::test_phony_requires_grad, test/distributed/pipeline/sync/test_phony.py::test_cached_phony, test/distributed/pipeline/sync/test_phony.py::test_phony_in_autograd_function 2023-01-11T22:13:02.6612277Z 2023-01-11T22:13:02.6612501Z distributed/pipeline/sync/test_phony.py::test_phony_size PASSED [ 25%] 2023-01-11T22:13:02.6614014Z distributed/pipeline/sync/test_phony.py::test_phony_requires_grad PASSED [ 50%] 2023-01-11T22:13:02.6614577Z distributed/pipeline/sync/test_phony.py::test_cached_phony PASSED [ 75%] 2023-01-11T22:13:02.6615010Z distributed/pipeline/sync/test_phony.py::test_phony_in_autograd_function PASSED [100%] 2023-01-11T22:13:02.6615271Z 2023-01-11T22:13:02.6615432Z ============================== 4 passed in 0.06s =============================== 2023-01-11T22:13:02.6615668Z 2023-01-11T22:13:02.6616134Z ##[endgroup] 2023-01-11T22:13:02.6616763Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_phony (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_phony_rxnojwij) 2023-01-11T22:13:02.6617141Z 2023-01-11T22:13:02.6617521Z Running distributed/pipeline/sync/test_dependency ... [2023-01-11 22:13:02.660526] 2023-01-11T22:13:02.6618260Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_dependency.py', '-v'] ... [2023-01-11 22:13:02.660776] 2023-01-11T22:13:06.4277886Z 2023-01-11T22:13:06.4278973Z Expand the folded group to see the log file of distributed/pipeline/sync/test_dependency 2023-01-11T22:13:06.4280261Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_dependency (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_dependency_qdt2ypde) 2023-01-11T22:13:06.4281086Z ============================= test session starts ============================== 2023-01-11T22:13:06.4281701Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:13:06.4282085Z cachedir: .pytest_cache 2023-01-11T22:13:06.4282640Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:13:06.4283084Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:13:06.4283516Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:13:06.4284088Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:13:06.4284721Z collecting ... collected 6 items 2023-01-11T22:13:06.4285645Z Running 6 items in this shard: test/distributed/pipeline/sync/test_dependency.py::test_fork_join, test/distributed/pipeline/sync/test_dependency.py::test_fork_join_enable_grad, test/distributed/pipeline/sync/test_dependency.py::test_fork_join_no_grad, test/distributed/pipeline/sync/test_dependency.py::test_fork_leak, test/distributed/pipeline/sync/test_dependency.py::test_join_when_fork_not_requires_grad, test/distributed/pipeline/sync/test_dependency.py::test_join_when_fork_requires_grad 2023-01-11T22:13:06.4286403Z 2023-01-11T22:13:06.4286621Z distributed/pipeline/sync/test_dependency.py::test_fork_join PASSED [ 16%] 2023-01-11T22:13:06.4287077Z distributed/pipeline/sync/test_dependency.py::test_fork_join_enable_grad PASSED [ 33%] 2023-01-11T22:13:06.4287544Z distributed/pipeline/sync/test_dependency.py::test_fork_join_no_grad PASSED [ 50%] 2023-01-11T22:13:06.4287977Z distributed/pipeline/sync/test_dependency.py::test_fork_leak PASSED [ 66%] 2023-01-11T22:13:06.4288448Z distributed/pipeline/sync/test_dependency.py::test_join_when_fork_not_requires_grad PASSED [ 83%] 2023-01-11T22:13:06.4288935Z distributed/pipeline/sync/test_dependency.py::test_join_when_fork_requires_grad PASSED [100%] 2023-01-11T22:13:06.4289202Z 2023-01-11T22:13:06.4289349Z ============================== 6 passed in 1.37s =============================== 2023-01-11T22:13:06.4289545Z 2023-01-11T22:13:06.4289870Z ##[endgroup] 2023-01-11T22:13:06.4290538Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_dependency (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_dependency_qdt2ypde) 2023-01-11T22:13:06.4290931Z 2023-01-11T22:13:06.4291223Z Running distributed/pipeline/sync/test_checkpoint ... [2023-01-11 22:13:06.427802] 2023-01-11T22:13:06.4291841Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_checkpoint.py', '-v'] ... [2023-01-11 22:13:06.428151] 2023-01-11T22:13:10.2377436Z 2023-01-11T22:13:10.2377963Z Expand the folded group to see the log file of distributed/pipeline/sync/test_checkpoint 2023-01-11T22:13:10.2378974Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_checkpoint_bcipynyz) 2023-01-11T22:13:10.2379548Z ============================= test session starts ============================== 2023-01-11T22:13:10.2380165Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:13:10.2380503Z cachedir: .pytest_cache 2023-01-11T22:13:10.2381078Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:13:10.2381515Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:13:10.2381852Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:13:10.2382406Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:13:10.2382811Z collecting ... collected 9 items 2023-01-11T22:13:10.2384380Z Running 9 items in this shard: test/distributed/pipeline/sync/test_checkpoint.py::test_serial_checkpoints[cpu], test/distributed/pipeline/sync/test_checkpoint.py::test_serial_checkpoints[cuda], test/distributed/pipeline/sync/test_checkpoint.py::test_not_requires_grad, test/distributed/pipeline/sync/test_checkpoint.py::test_not_requires_grad_with_parameter, test/distributed/pipeline/sync/test_checkpoint.py::test_random_in_checkpoint[cpu], test/distributed/pipeline/sync/test_checkpoint.py::test_random_in_checkpoint[cuda], test/distributed/pipeline/sync/test_checkpoint.py::test_detect_checkpointing_recomputing, test/distributed/pipeline/sync/test_checkpoint.py::test_detect_checkpointing_recomputing_without_checkpoint, test/distributed/pipeline/sync/test_checkpoint.py::test_non_grad_output 2023-01-11T22:13:10.2385646Z 2023-01-11T22:13:10.2385890Z distributed/pipeline/sync/test_checkpoint.py::test_serial_checkpoints[cpu] PASSED [ 11%] 2023-01-11T22:13:10.2386374Z distributed/pipeline/sync/test_checkpoint.py::test_serial_checkpoints[cuda] PASSED [ 22%] 2023-01-11T22:13:10.2386832Z distributed/pipeline/sync/test_checkpoint.py::test_not_requires_grad PASSED [ 33%] 2023-01-11T22:13:10.2387321Z distributed/pipeline/sync/test_checkpoint.py::test_not_requires_grad_with_parameter PASSED [ 44%] 2023-01-11T22:13:10.2387817Z distributed/pipeline/sync/test_checkpoint.py::test_random_in_checkpoint[cpu] PASSED [ 55%] 2023-01-11T22:13:10.2388297Z distributed/pipeline/sync/test_checkpoint.py::test_random_in_checkpoint[cuda] PASSED [ 66%] 2023-01-11T22:13:10.2388773Z distributed/pipeline/sync/test_checkpoint.py::test_detect_checkpointing_recomputing PASSED [ 77%] 2023-01-11T22:13:10.2389318Z distributed/pipeline/sync/test_checkpoint.py::test_detect_checkpointing_recomputing_without_checkpoint PASSED [ 88%] 2023-01-11T22:13:10.2389824Z distributed/pipeline/sync/test_checkpoint.py::test_non_grad_output PASSED [100%] 2023-01-11T22:13:10.2390077Z 2023-01-11T22:13:10.2390235Z ============================== 9 passed in 1.39s =============================== 2023-01-11T22:13:10.2390410Z 2023-01-11T22:13:10.2390729Z ##[endgroup] 2023-01-11T22:13:10.2391404Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_checkpoint_bcipynyz) 2023-01-11T22:13:10.2391805Z 2023-01-11T22:13:10.2392124Z Running distributed/pipeline/sync/skip/test_verify_skippables ... [2023-01-11 22:13:10.237823] 2023-01-11T22:13:10.2392765Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_verify_skippables.py', '-v'] ... [2023-01-11 22:13:10.238091] 2023-01-11T22:13:12.5932149Z 2023-01-11T22:13:12.5932680Z Expand the folded group to see the log file of distributed/pipeline/sync/skip/test_verify_skippables 2023-01-11T22:13:12.5934518Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/skip/test_verify_skippables (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_verify_skippables_v7bsv90j) 2023-01-11T22:13:12.5935104Z ============================= test session starts ============================== 2023-01-11T22:13:12.5935711Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:13:12.5936088Z cachedir: .pytest_cache 2023-01-11T22:13:12.5936644Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:13:12.5937082Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:13:12.5937416Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:13:12.5937972Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:13:12.5938379Z collecting ... collected 9 items 2023-01-11T22:13:12.5939715Z Running 9 items in this shard: test/distributed/pipeline/sync/skip/test_verify_skippables.py::test_matching, test/distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_not_pop, test/distributed/pipeline/sync/skip/test_verify_skippables.py::test_pop_unknown, test/distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_again, test/distributed/pipeline/sync/skip/test_verify_skippables.py::test_pop_again, test/distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_pop_together_different_names, test/distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_pop_together_same_name, test/distributed/pipeline/sync/skip/test_verify_skippables.py::test_double_stash_pop, test/distributed/pipeline/sync/skip/test_verify_skippables.py::test_double_stash_pop_but_isolated 2023-01-11T22:13:12.5941087Z 2023-01-11T22:13:12.5941405Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_matching PASSED [ 11%] 2023-01-11T22:13:12.5941911Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_not_pop PASSED [ 22%] 2023-01-11T22:13:12.5942392Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_pop_unknown PASSED [ 33%] 2023-01-11T22:13:12.5942851Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_again PASSED [ 44%] 2023-01-11T22:13:12.5943322Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_pop_again PASSED [ 55%] 2023-01-11T22:13:12.5943840Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_pop_together_different_names PASSED [ 66%] 2023-01-11T22:13:12.5944349Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_pop_together_same_name PASSED [ 77%] 2023-01-11T22:13:12.5944853Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_double_stash_pop PASSED [ 88%] 2023-01-11T22:13:12.5945374Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_double_stash_pop_but_isolated PASSED [100%] 2023-01-11T22:13:12.5945666Z 2023-01-11T22:13:12.5945831Z ============================== 9 passed in 0.05s =============================== 2023-01-11T22:13:12.5946028Z 2023-01-11T22:13:12.5947934Z ##[endgroup] 2023-01-11T22:13:12.5948704Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/skip/test_verify_skippables (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_verify_skippables_v7bsv90j) 2023-01-11T22:13:12.5949134Z 2023-01-11T22:13:12.5949432Z Running distributed/pipeline/sync/skip/test_portal ... [2023-01-11 22:13:12.593378] 2023-01-11T22:13:12.5950069Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_portal.py', '-v'] ... [2023-01-11 22:13:12.593616] 2023-01-11T22:13:16.4233550Z 2023-01-11T22:13:16.4234321Z Expand the folded group to see the log file of distributed/pipeline/sync/skip/test_portal 2023-01-11T22:13:16.4236083Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/skip/test_portal (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_portal_4t0vgsid) 2023-01-11T22:13:16.4237332Z ============================= test session starts ============================== 2023-01-11T22:13:16.4238134Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:13:16.4238476Z cachedir: .pytest_cache 2023-01-11T22:13:16.4239049Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:13:16.4239497Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:13:16.4239811Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:13:16.4240380Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:13:16.4240780Z collecting ... collected 10 items 2023-01-11T22:13:16.4242161Z Running 10 items in this shard: test/distributed/pipeline/sync/skip/test_portal.py::test_copy_returns_on_next_device, test/distributed/pipeline/sync/skip/test_portal.py::test_blue_orange, test/distributed/pipeline/sync/skip/test_portal.py::test_blue_orange_not_requires_grad, test/distributed/pipeline/sync/skip/test_portal.py::test_use_grad, test/distributed/pipeline/sync/skip/test_portal.py::TestTensorLife::test_tensor_life_0, test/distributed/pipeline/sync/skip/test_portal.py::TestTensorLife::test_tensor_life_1, test/distributed/pipeline/sync/skip/test_portal.py::TestTensorLife::test_tensor_life_2, test/distributed/pipeline/sync/skip/test_portal.py::TestTensorLife::test_tensor_life_3, test/distributed/pipeline/sync/skip/test_portal.py::TestTensorLife::test_tensor_life_4, test/distributed/pipeline/sync/skip/test_portal.py::TestTensorLife::test_tensor_life_3_plus_1 2023-01-11T22:13:16.4243557Z 2023-01-11T22:13:16.4243800Z distributed/pipeline/sync/skip/test_portal.py::test_copy_returns_on_next_device PASSED [ 10%] 2023-01-11T22:13:16.4244263Z distributed/pipeline/sync/skip/test_portal.py::test_blue_orange PASSED [ 20%] 2023-01-11T22:13:16.4244811Z distributed/pipeline/sync/skip/test_portal.py::test_blue_orange_not_requires_grad PASSED [ 30%] 2023-01-11T22:13:16.4245270Z distributed/pipeline/sync/skip/test_portal.py::test_use_grad PASSED [ 40%] 2023-01-11T22:13:16.4245751Z distributed/pipeline/sync/skip/test_portal.py::TestTensorLife::test_tensor_life_0 PASSED [ 50%] 2023-01-11T22:13:16.4246232Z distributed/pipeline/sync/skip/test_portal.py::TestTensorLife::test_tensor_life_1 PASSED [ 60%] 2023-01-11T22:13:16.4246688Z distributed/pipeline/sync/skip/test_portal.py::TestTensorLife::test_tensor_life_2 PASSED [ 70%] 2023-01-11T22:13:16.4247173Z distributed/pipeline/sync/skip/test_portal.py::TestTensorLife::test_tensor_life_3 PASSED [ 80%] 2023-01-11T22:13:16.4247638Z distributed/pipeline/sync/skip/test_portal.py::TestTensorLife::test_tensor_life_4 PASSED [ 90%] 2023-01-11T22:13:16.4248134Z distributed/pipeline/sync/skip/test_portal.py::TestTensorLife::test_tensor_life_3_plus_1 PASSED [100%] 2023-01-11T22:13:16.4248410Z 2023-01-11T22:13:16.4248556Z ============================== 10 passed in 1.40s ============================== 2023-01-11T22:13:16.4248753Z 2023-01-11T22:13:16.4249071Z ##[endgroup] 2023-01-11T22:13:16.4249751Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/skip/test_portal (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_portal_4t0vgsid) 2023-01-11T22:13:16.4250150Z 2023-01-11T22:13:16.4250422Z Running distributed/pipeline/sync/skip/test_gpipe ... [2023-01-11 22:13:16.423395] 2023-01-11T22:13:16.4251050Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_gpipe.py', '-v'] ... [2023-01-11 22:13:16.423641] 2023-01-11T22:13:23.9937666Z 2023-01-11T22:13:23.9938197Z Expand the folded group to see the log file of distributed/pipeline/sync/skip/test_gpipe 2023-01-11T22:13:23.9939234Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/skip/test_gpipe (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_gpipe_ybmgqoj9) 2023-01-11T22:13:23.9939806Z ============================= test session starts ============================== 2023-01-11T22:13:23.9940401Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:13:23.9940762Z cachedir: .pytest_cache 2023-01-11T22:13:23.9941339Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:13:23.9941758Z torch: 2.0.0a0+git8419ddd 2023-01-11T22:13:23.9942092Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:13:23.9942690Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:13:23.9943101Z collecting ... collected 13 items 2023-01-11T22:13:23.9945325Z Running 13 items in this shard: test/distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-3], test/distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-1:2], test/distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-2:1], test/distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-1:1:1], test/distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-3], test/distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-1:2], test/distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-2:1], test/distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-1:1:1], test/distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-3], test/distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-1:2], test/distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-2:1], test/distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-1:1:1], test/distributed/pipeline/sync/skip/test_gpipe.py::test_none_skip 2023-01-11T22:13:23.9946926Z 2023-01-11T22:13:23.9947248Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-3] PASSED [ 7%] 2023-01-11T22:13:23.9947784Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-1:2] PASSED [ 15%] 2023-01-11T22:13:23.9948405Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-2:1] PASSED [ 23%] 2023-01-11T22:13:23.9948945Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-1:1:1] SKIPPED [ 30%] 2023-01-11T22:13:23.9949477Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-3] PASSED [ 38%] 2023-01-11T22:13:23.9950009Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-1:2] PASSED [ 46%] 2023-01-11T22:13:23.9950536Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-2:1] PASSED [ 53%] 2023-01-11T22:13:23.9951060Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-1:1:1] SKIPPED [ 61%] 2023-01-11T22:13:23.9951604Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-3] PASSED [ 69%] 2023-01-11T22:13:23.9952148Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-1:2] PASSED [ 76%] 2023-01-11T22:13:23.9952670Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-2:1] PASSED [ 84%] 2023-01-11T22:13:23.9953229Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-1:1:1] SKIPPED [ 92%] 2023-01-11T22:13:23.9953682Z distributed/pipeline/sync/skip/test_gpipe.py::test_none_skip PASSED [100%] 2023-01-11T22:13:23.9953929Z 2023-01-11T22:13:23.9954101Z ======================== 10 passed, 3 skipped in 4.86s ========================= 2023-01-11T22:13:23.9954303Z 2023-01-11T22:13:23.9954599Z ##[endgroup] 2023-01-11T22:13:23.9955254Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/skip/test_gpipe (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_gpipe_ybmgqoj9) 2023-01-11T22:13:23.9955656Z 2023-01-11T22:13:23.9955964Z Running distributed/optim/test_apply_optimizer_in_backward ... [2023-01-11 22:13:23.993928] 2023-01-11T22:13:23.9956687Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/optim/test_apply_optimizer_in_backward.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:13:23.994202] 2023-01-11T22:13:26.1101813Z 2023-01-11T22:13:26.1102493Z Expand the folded group to see the log file of distributed/optim/test_apply_optimizer_in_backward 2023-01-11T22:13:26.1103522Z ##[group]PRINTING LOG FILE of distributed/optim/test_apply_optimizer_in_backward (/var/lib/jenkins/workspace/test/test-reports/distributed-optim-test_apply_optimizer_in_backward_3pzdg17g) 2023-01-11T22:13:26.1103938Z 2023-01-11T22:13:26.1104224Z ##[endgroup] 2023-01-11T22:13:26.1105026Z FINISHED PRINTING LOG FILE of distributed/optim/test_apply_optimizer_in_backward (/var/lib/jenkins/workspace/test/test-reports/distributed-optim-test_apply_optimizer_in_backward_3pzdg17g) 2023-01-11T22:13:26.1105443Z 2023-01-11T22:13:26.1105732Z Running distributed/elastic/events/lib_test ... [2023-01-11 22:13:26.110288] 2023-01-11T22:13:26.1108517Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/elastic/events/lib_test.py', '-v'] ... [2023-01-11 22:13:26.110562] 2023-01-11T22:13:28.6310675Z 2023-01-11T22:13:28.6311436Z Expand the folded group to see the log file of distributed/elastic/events/lib_test 2023-01-11T22:13:28.6313008Z ##[group]PRINTING LOG FILE of distributed/elastic/events/lib_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-events-lib_test_87u9guxl) 2023-01-11T22:13:28.6314086Z ============================= test session starts ============================== 2023-01-11T22:13:28.6314715Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python 2023-01-11T22:13:28.6315053Z cachedir: .pytest_cache 2023-01-11T22:13:28.6316241Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2023-01-11T22:13:28.6316753Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2023-01-11T22:13:28.6317332Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0 2023-01-11T22:13:28.6317716Z collecting ... collected 8 items 2023-01-11T22:13:28.6319087Z Running 8 items in this shard: test/distributed/elastic/events/lib_test.py::EventLibTest::test_event_created, test/distributed/elastic/events/lib_test.py::EventLibTest::test_event_deser, test/distributed/elastic/events/lib_test.py::EventLibTest::test_get_or_create_logger, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_construct_and_record_rdzv_event, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_construct_and_record_rdzv_event_does_not_run_if_invalid_dest, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_created, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_deserialize, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_str 2023-01-11T22:13:28.6320177Z 2023-01-11T22:13:28.6320409Z distributed/elastic/events/lib_test.py::EventLibTest::test_event_created PASSED [ 12%] 2023-01-11T22:13:28.6320869Z distributed/elastic/events/lib_test.py::EventLibTest::test_event_deser PASSED [ 25%] 2023-01-11T22:13:28.6321341Z distributed/elastic/events/lib_test.py::EventLibTest::test_get_or_create_logger PASSED [ 37%] 2023-01-11T22:13:28.6321826Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_construct_and_record_rdzv_event PASSED [ 50%] 2023-01-11T22:13:28.6322384Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_construct_and_record_rdzv_event_does_not_run_if_invalid_dest PASSED [ 62%] 2023-01-11T22:13:28.6322943Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_created PASSED [ 75%] 2023-01-11T22:13:28.6323446Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_deserialize PASSED [ 87%] 2023-01-11T22:13:28.6323916Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_str PASSED [100%] 2023-01-11T22:13:28.6324181Z 2023-01-11T22:13:28.6324340Z ============================== 8 passed in 1.77s =============================== 2023-01-11T22:13:28.6324534Z 2023-01-11T22:13:28.6324851Z ##[endgroup] 2023-01-11T22:13:28.6325488Z FINISHED PRINTING LOG FILE of distributed/elastic/events/lib_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-events-lib_test_87u9guxl) 2023-01-11T22:13:28.6325834Z 2023-01-11T22:13:28.6326159Z Running distributed/_shard/test_replicated_tensor ... [2023-01-11 22:13:28.631168] 2023-01-11T22:13:28.6326864Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/test_replicated_tensor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:13:28.631430] 2023-01-11T22:13:30.7361873Z 2023-01-11T22:13:30.7362455Z Expand the folded group to see the log file of distributed/_shard/test_replicated_tensor 2023-01-11T22:13:30.7363733Z ##[group]PRINTING LOG FILE of distributed/_shard/test_replicated_tensor (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-test_replicated_tensor_0dof7mxg) 2023-01-11T22:13:30.7364125Z 2023-01-11T22:13:30.7364440Z ##[endgroup] 2023-01-11T22:13:30.7365199Z FINISHED PRINTING LOG FILE of distributed/_shard/test_replicated_tensor (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-test_replicated_tensor_0dof7mxg) 2023-01-11T22:13:30.7365553Z 2023-01-11T22:13:30.7365928Z Running distributed/_composable/test_checkpoint ... [2023-01-11 22:13:30.736258] 2023-01-11T22:13:30.7367592Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_composable/test_checkpoint.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:13:30.736506] 2023-01-11T22:13:35.0907364Z 2023-01-11T22:13:35.0907995Z Expand the folded group to see the log file of distributed/_composable/test_checkpoint 2023-01-11T22:13:35.0909308Z ##[group]PRINTING LOG FILE of distributed/_composable/test_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-_composable-test_checkpoint_anapqtap) 2023-01-11T22:13:35.0909718Z 2023-01-11T22:13:35.0909833Z Running tests... 2023-01-11T22:13:35.0910357Z ---------------------------------------------------------------------- 2023-01-11T22:13:35.0911025Z Test results will be stored in test-reports/python-unittest/distributed._composable.test_checkpoint 2023-01-11T22:13:35.0911487Z test_random_cpu (__main__.TestCheckpoint) ... ok (0.019s) 2023-01-11T22:13:35.0911894Z test_tensor_only_cpu_use_reentrant_False (__main__.TestCheckpoint) ... ok (0.005s) 2023-01-11T22:13:35.0912326Z test_tensor_only_cpu_use_reentrant_True (__main__.TestCheckpoint) ... ok (0.005s) 2023-01-11T22:13:35.0912744Z test_tensor_only_gpu_use_reentrant_False (__main__.TestCheckpoint) ... ok (0.408s) 2023-01-11T22:13:35.0913181Z test_tensor_only_gpu_use_reentrant_True (__main__.TestCheckpoint) ... ok (0.007s) 2023-01-11T22:13:35.0913426Z 2023-01-11T22:13:35.0913711Z ---------------------------------------------------------------------- 2023-01-11T22:13:35.0914024Z Ran 5 tests in 0.443s 2023-01-11T22:13:35.0914188Z 2023-01-11T22:13:35.0914284Z OK 2023-01-11T22:13:35.0914421Z 2023-01-11T22:13:35.0914548Z Generating XML reports... 2023-01-11T22:13:35.0915157Z Generated XML report: test-reports/python-unittest/distributed._composable.test_checkpoint/TEST-TestCheckpoint-20230111221334.xml 2023-01-11T22:13:35.0915489Z 2023-01-11T22:13:35.0915799Z ##[endgroup] 2023-01-11T22:13:35.0916422Z FINISHED PRINTING LOG FILE of distributed/_composable/test_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-_composable-test_checkpoint_anapqtap) 2023-01-11T22:13:35.0916795Z 2023-01-11T22:13:35.0917056Z Running distributed/test_nccl ... [2023-01-11 22:13:35.090804] 2023-01-11T22:13:35.0917674Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_nccl.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:13:35.091056] 2023-01-11T22:13:40.2197692Z 2023-01-11T22:13:40.2198767Z Expand the folded group to see the log file of distributed/test_nccl 2023-01-11T22:13:40.2200198Z ##[group]PRINTING LOG FILE of distributed/test_nccl (/var/lib/jenkins/workspace/test/test-reports/distributed-test_nccl_haujs8v9) 2023-01-11T22:13:40.2200549Z 2023-01-11T22:13:40.2200666Z Running tests... 2023-01-11T22:13:40.2201186Z ---------------------------------------------------------------------- 2023-01-11T22:13:40.2201726Z Test results will be stored in test-reports/python-unittest/distributed.test_nccl 2023-01-11T22:13:40.2202156Z test_all_gather_cuda_bfloat16 (__main__.TestNCCLCUDA) ... ok (1.148s) 2023-01-11T22:13:40.2202534Z test_all_gather_cuda_float32 (__main__.TestNCCLCUDA) ... ok (0.003s) 2023-01-11T22:13:40.2202922Z test_all_reduce_cuda_bfloat16 (__main__.TestNCCLCUDA) ... ok (0.004s) 2023-01-11T22:13:40.2203312Z test_all_reduce_cuda_float32 (__main__.TestNCCLCUDA) ... ok (0.004s) 2023-01-11T22:13:40.2203879Z test_broadcast_cuda_bfloat16 (__main__.TestNCCLCUDA) ... ok (0.003s) 2023-01-11T22:13:40.2204522Z test_broadcast_cuda_float32 (__main__.TestNCCLCUDA) ... ok (0.003s) 2023-01-11T22:13:40.2205103Z test_collective_errors_cuda (__main__.TestNCCLCUDA) ... ok (0.002s) 2023-01-11T22:13:40.2205974Z test_reduce_cuda_bfloat16 (__main__.TestNCCLCUDA) ... ok (0.002s) 2023-01-11T22:13:40.2206781Z test_reduce_cuda_float32 (__main__.TestNCCLCUDA) ... ok (0.002s) 2023-01-11T22:13:40.2207251Z test_reduce_scatter_cuda_bfloat16 (__main__.TestNCCLCUDA) ... ok (0.004s) 2023-01-11T22:13:40.2207653Z test_reduce_scatter_cuda_float32 (__main__.TestNCCLCUDA) ... ok (0.003s) 2023-01-11T22:13:40.2208011Z test_unique_id_cuda (__main__.TestNCCLCUDA) ... ok (0.001s) 2023-01-11T22:13:40.2208222Z 2023-01-11T22:13:40.2208516Z ---------------------------------------------------------------------- 2023-01-11T22:13:40.2209108Z Ran 12 tests in 1.181s 2023-01-11T22:13:40.2209273Z 2023-01-11T22:13:40.2209369Z OK 2023-01-11T22:13:40.2209504Z 2023-01-11T22:13:40.2209635Z Generating XML reports... 2023-01-11T22:13:40.2210212Z Generated XML report: test-reports/python-unittest/distributed.test_nccl/TEST-TestNCCLCUDA-20230111221338.xml 2023-01-11T22:13:40.2210532Z 2023-01-11T22:13:40.2210864Z ##[endgroup] 2023-01-11T22:13:40.2211497Z FINISHED PRINTING LOG FILE of distributed/test_nccl (/var/lib/jenkins/workspace/test/test-reports/distributed-test_nccl_haujs8v9) 2023-01-11T22:13:40.2211833Z 2023-01-11T22:13:40.2212122Z Running distributed/checkpoint/test_traverse ... [2023-01-11 22:13:40.219867] 2023-01-11T22:13:40.2212813Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/checkpoint/test_traverse.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:13:40.220142] 2023-01-11T22:13:44.1326883Z 2023-01-11T22:13:44.1327773Z Expand the folded group to see the log file of distributed/checkpoint/test_traverse 2023-01-11T22:13:44.1329008Z ##[group]PRINTING LOG FILE of distributed/checkpoint/test_traverse (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_traverse__zzdmdvx) 2023-01-11T22:13:44.1329411Z 2023-01-11T22:13:44.1329529Z Running tests... 2023-01-11T22:13:44.1330025Z ---------------------------------------------------------------------- 2023-01-11T22:13:44.1330600Z Test results will be stored in test-reports/python-unittest/distributed.checkpoint.test_traverse 2023-01-11T22:13:44.1331060Z test_get_element (__main__.TestTraverse) ... ok (1.624s) 2023-01-11T22:13:44.1331396Z test_set_element (__main__.TestTraverse) ... ok (0.002s) 2023-01-11T22:13:44.1331807Z test_traverse_doesnt_ignore_intermediate_collections (__main__.TestTraverse) ... ok (0.002s) 2023-01-11T22:13:44.1332227Z test_traverse_nested_dict (__main__.TestTraverse) ... ok (0.001s) 2023-01-11T22:13:44.1332592Z test_traverse_nested_list (__main__.TestTraverse) ... ok (0.002s) 2023-01-11T22:13:44.1333202Z test_traverse_shallow (__main__.TestTraverse) ... ok (0.002s) 2023-01-11T22:13:44.1333597Z test_traverse_with_ordered_dict (__main__.TestTraverse) ... ok (0.001s) 2023-01-11T22:13:44.1333827Z 2023-01-11T22:13:44.1334110Z ---------------------------------------------------------------------- 2023-01-11T22:13:44.1334427Z Ran 7 tests in 1.635s 2023-01-11T22:13:44.1334592Z 2023-01-11T22:13:44.1334691Z OK 2023-01-11T22:13:44.1334829Z 2023-01-11T22:13:44.1334955Z Generating XML reports... 2023-01-11T22:13:44.1335548Z Generated XML report: test-reports/python-unittest/distributed.checkpoint.test_traverse/TEST-TestTraverse-20230111221342.xml 2023-01-11T22:13:44.1335896Z 2023-01-11T22:13:44.1336214Z ##[endgroup] 2023-01-11T22:13:44.1336835Z FINISHED PRINTING LOG FILE of distributed/checkpoint/test_traverse (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_traverse__zzdmdvx) 2023-01-11T22:13:44.1337205Z 2023-01-11T22:13:44.1337494Z Running distributed/nn/jit/test_instantiator ... [2023-01-11 22:13:44.132739] 2023-01-11T22:13:44.1338174Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/nn/jit/test_instantiator.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:13:44.132988] 2023-01-11T22:13:47.9950470Z 2023-01-11T22:13:47.9951391Z Expand the folded group to see the log file of distributed/nn/jit/test_instantiator 2023-01-11T22:13:47.9953377Z ##[group]PRINTING LOG FILE of distributed/nn/jit/test_instantiator (/var/lib/jenkins/workspace/test/test-reports/distributed-nn-jit-test_instantiator_4gln977k) 2023-01-11T22:13:47.9954154Z 2023-01-11T22:13:47.9954280Z Running tests... 2023-01-11T22:13:47.9954909Z ---------------------------------------------------------------------- 2023-01-11T22:13:47.9956147Z Test results will be stored in test-reports/python-unittest/distributed.nn.jit.test_instantiator 2023-01-11T22:13:47.9956641Z test_get_arg_return_types_from_interface (__main__.TestInstantiator) ... ok (1.616s) 2023-01-11T22:13:47.9957086Z test_instantiate_non_scripted_remote_module_template (__main__.TestInstantiator) ... ok (0.002s) 2023-01-11T22:13:47.9958429Z test_instantiate_scripted_remote_module_template (__main__.TestInstantiator) ... ok (0.014s) 2023-01-11T22:13:47.9958830Z 2023-01-11T22:13:47.9959122Z ---------------------------------------------------------------------- 2023-01-11T22:13:47.9959438Z Ran 3 tests in 1.632s 2023-01-11T22:13:47.9959603Z 2023-01-11T22:13:47.9959698Z OK 2023-01-11T22:13:47.9959833Z 2023-01-11T22:13:47.9960080Z Generating XML reports... 2023-01-11T22:13:47.9960720Z Generated XML report: test-reports/python-unittest/distributed.nn.jit.test_instantiator/TEST-TestInstantiator-20230111221345.xml 2023-01-11T22:13:47.9961053Z 2023-01-11T22:13:47.9961378Z ##[endgroup] 2023-01-11T22:13:47.9961993Z FINISHED PRINTING LOG FILE of distributed/nn/jit/test_instantiator (/var/lib/jenkins/workspace/test/test-reports/distributed-nn-jit-test_instantiator_4gln977k) 2023-01-11T22:13:47.9962355Z 2023-01-11T22:13:47.9962635Z Running distributed/checkpoint/test_utils ... [2023-01-11 22:13:47.995217] 2023-01-11T22:13:47.9963301Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/checkpoint/test_utils.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:13:47.995518] 2023-01-11T22:13:51.9045722Z 2023-01-11T22:13:51.9046429Z Expand the folded group to see the log file of distributed/checkpoint/test_utils 2023-01-11T22:13:51.9047501Z ##[group]PRINTING LOG FILE of distributed/checkpoint/test_utils (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_utils_x68bh228) 2023-01-11T22:13:51.9047871Z 2023-01-11T22:13:51.9048008Z Running tests... 2023-01-11T22:13:51.9048689Z ---------------------------------------------------------------------- 2023-01-11T22:13:51.9049271Z Test results will be stored in test-reports/python-unittest/distributed.checkpoint.test_utils 2023-01-11T22:13:51.9049722Z test_flat_data (__main__.TestMedatadaIndex) ... ok (1.621s) 2023-01-11T22:13:51.9050105Z test_index_hint_ignored_on_equals (__main__.TestMedatadaIndex) ... ok (0.001s) 2023-01-11T22:13:51.9050539Z test_index_hint_ignored_on_hash (__main__.TestMedatadaIndex) ... ok (0.001s) 2023-01-11T22:13:51.9051099Z test_init_convert_offset (__main__.TestMedatadaIndex) ... ok (0.001s) 2023-01-11T22:13:51.9051528Z test_sharded_tensor_lookup (__main__.TestMedatadaIndex) ... ok (0.003s) 2023-01-11T22:13:51.9051744Z 2023-01-11T22:13:51.9052025Z ---------------------------------------------------------------------- 2023-01-11T22:13:51.9052363Z Ran 5 tests in 1.626s 2023-01-11T22:13:51.9052529Z 2023-01-11T22:13:51.9052623Z OK 2023-01-11T22:13:51.9052759Z 2023-01-11T22:13:51.9053088Z Generating XML reports... 2023-01-11T22:13:51.9053730Z Generated XML report: test-reports/python-unittest/distributed.checkpoint.test_utils/TEST-TestMedatadaIndex-20230111221349.xml 2023-01-11T22:13:51.9054090Z 2023-01-11T22:13:51.9054411Z ##[endgroup] 2023-01-11T22:13:51.9055004Z FINISHED PRINTING LOG FILE of distributed/checkpoint/test_utils (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_utils_x68bh228) 2023-01-11T22:13:51.9055370Z 2023-01-11T22:13:51.9055662Z Running distributed/_tensor/test_pointwise_ops ... [2023-01-11 22:13:51.904696] 2023-01-11T22:13:51.9056358Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_tensor/test_pointwise_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:13:51.904970] 2023-01-11T22:13:55.8798541Z 2023-01-11T22:13:55.8799450Z Expand the folded group to see the log file of distributed/_tensor/test_pointwise_ops 2023-01-11T22:13:55.8800420Z ##[group]PRINTING LOG FILE of distributed/_tensor/test_pointwise_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_tensor-test_pointwise_ops_nrcomshs) 2023-01-11T22:13:55.8800797Z 2023-01-11T22:13:55.8800913Z Running tests... 2023-01-11T22:13:55.8801473Z ---------------------------------------------------------------------- 2023-01-11T22:13:55.8802023Z Test results will be stored in test-reports/python-unittest/distributed._tensor.test_pointwise_ops 2023-01-11T22:13:55.8802852Z test_activations (__main__.DistElementwiseOpsTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:13:55.8803387Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:55.8803890Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:55.8804510Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:55.8805049Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:55.8805771Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8806489Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8807225Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8807956Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8808375Z ok (1.633s) 2023-01-11T22:13:55.8808824Z test_dropout (__main__.DistElementwiseOpsTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:55.8809426Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:55.8809947Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:55.8810452Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:55.8811145Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8811879Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8812601Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8813553Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8814271Z INFO:torch.testing._internal.common_distributed:Thread 3 skipping test test_dropout for following reason: testing RNG based ops is broken: https://github.com/pytorch/tau/issues/494 2023-01-11T22:13:55.8815042Z INFO:torch.testing._internal.common_distributed:Thread 0 skipping test test_dropout for following reason: testing RNG based ops is broken: https://github.com/pytorch/tau/issues/494 2023-01-11T22:13:55.8815806Z INFO:torch.testing._internal.common_distributed:Thread 2 skipping test test_dropout for following reason: testing RNG based ops is broken: https://github.com/pytorch/tau/issues/494 2023-01-11T22:13:55.8816571Z INFO:torch.testing._internal.common_distributed:Thread 1 skipping test test_dropout for following reason: testing RNG based ops is broken: https://github.com/pytorch/tau/issues/494 2023-01-11T22:13:55.8817172Z skip: Test skipped at subprocess level, look at subprocess log for skip reason (0.012s) 2023-01-11T22:13:55.8817774Z test_dropout_backward (__main__.DistElementwiseOpsTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:55.8818382Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:55.8818903Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:55.8819406Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:55.8820099Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8820967Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8821696Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8822475Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8822908Z ok (0.021s) 2023-01-11T22:13:55.8823389Z test_dropout_errors (__main__.DistElementwiseOpsTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:55.8823974Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:55.8824494Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:55.8825021Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:55.8825718Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8826420Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8827154Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8827908Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8828328Z ok (0.013s) 2023-01-11T22:13:55.8828773Z test_mul_out (__main__.DistElementwiseOpsTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:55.8829368Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:55.8829890Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:55.8830579Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8831123Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:55.8831813Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8832537Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8833246Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:55.8833664Z ok (0.016s) 2023-01-11T22:13:55.8833828Z 2023-01-11T22:13:55.8834114Z ---------------------------------------------------------------------- 2023-01-11T22:13:55.8834466Z Ran 5 tests in 1.696s 2023-01-11T22:13:55.8834622Z 2023-01-11T22:13:55.8834737Z OK (skipped=1) 2023-01-11T22:13:55.8834898Z 2023-01-11T22:13:55.8835042Z Generating XML reports... 2023-01-11T22:13:55.8835697Z Generated XML report: test-reports/python-unittest/distributed._tensor.test_pointwise_ops/TEST-DistElementwiseOpsTest-20230111221353.xml 2023-01-11T22:13:55.8836094Z 2023-01-11T22:13:55.8836405Z ##[endgroup] 2023-01-11T22:13:55.8837047Z FINISHED PRINTING LOG FILE of distributed/_tensor/test_pointwise_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_tensor-test_pointwise_ops_nrcomshs) 2023-01-11T22:13:55.8837437Z 2023-01-11T22:13:55.8837725Z Running distributed/test_multi_threaded_pg ... [2023-01-11 22:13:55.880091] 2023-01-11T22:13:55.8838450Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_multi_threaded_pg.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:13:55.880382] 2023-01-11T22:13:59.9130578Z 2023-01-11T22:13:59.9131310Z Expand the folded group to see the log file of distributed/test_multi_threaded_pg 2023-01-11T22:13:59.9132255Z ##[group]PRINTING LOG FILE of distributed/test_multi_threaded_pg (/var/lib/jenkins/workspace/test/test-reports/distributed-test_multi_threaded_pg_41g0dkg9) 2023-01-11T22:13:59.9132618Z 2023-01-11T22:13:59.9132712Z Running tests... 2023-01-11T22:13:59.9133829Z ---------------------------------------------------------------------- 2023-01-11T22:13:59.9134786Z Test results will be stored in test-reports/python-unittest/distributed.test_multi_threaded_pg 2023-01-11T22:13:59.9135479Z test_all_reduce (__main__.TestCollectivesWithBaseClass) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:13:59.9136051Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:59.9136520Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:59.9137018Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:59.9137699Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9138384Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9138904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:59.9139554Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9140230Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9140601Z ok (1.640s) 2023-01-11T22:13:59.9141073Z test_allgather (__main__.TestCollectivesWithBaseClass) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:59.9141653Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:59.9142140Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:59.9142610Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:59.9143264Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9143939Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9144615Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9145272Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9145660Z ok (0.017s) 2023-01-11T22:13:59.9146132Z test_assert_equal_on_rank (__main__.TestCollectivesWithBaseClass) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:59.9146716Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:59.9147187Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:59.9147669Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:59.9148317Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9148971Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9149796Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9150475Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9150862Z ok (0.014s) 2023-01-11T22:13:59.9151364Z test_broadcast (__main__.TestCollectivesWithBaseClass) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:59.9151951Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:59.9152437Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:59.9152918Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:59.9153556Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9154237Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9154911Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9155586Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9155956Z ok (0.019s) 2023-01-11T22:13:59.9156427Z test_broadcast_object_list (__main__.TestCollectivesWithBaseClass) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:59.9157014Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:59.9157479Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:59.9157963Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:59.9158611Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9159284Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9159944Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9160618Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9161027Z 3 -> 4 2023-01-11T22:13:59.9161264Z 1 -> 4 2023-01-11T22:13:59.9161481Z 0 -> 4 2023-01-11T22:13:59.9161712Z 2 -> 4 2023-01-11T22:13:59.9161926Z ok (0.016s) 2023-01-11T22:13:59.9162377Z test_reduce_scatter (__main__.TestCollectivesWithBaseClass) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:59.9162965Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:59.9163446Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:59.9164078Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9164602Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:59.9165244Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9165915Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9166571Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9167043Z ok (0.014s) 2023-01-11T22:13:59.9167500Z test_scatter (__main__.TestCollectivesWithBaseClass) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:59.9168071Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:59.9168589Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:59.9169079Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:59.9169731Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9170406Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9171069Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9171741Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9172126Z ok (0.014s) 2023-01-11T22:13:59.9172591Z test_broadcast_object_list (__main__.TestCollectivesWithWrapper) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:59.9173356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:59.9173842Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:59.9174492Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9174999Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:59.9175644Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9176312Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9176986Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9177347Z ok (0.015s) 2023-01-11T22:13:59.9177835Z test_collective_error_on_rank_non_zero (__main__.TestCollectivesWithWrapper) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:59.9178436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:59.9178919Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:59.9179552Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9180072Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:59.9180711Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9181387Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9182044Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9182532Z ERROR:torch.testing._internal.common_distributed:Caught exception: 2023-01-11T22:13:59.9182887Z Traceback (most recent call last): 2023-01-11T22:13:59.9183423Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/distributed/multi_threaded_pg.py", line 365, in worker 2023-01-11T22:13:59.9183918Z callback() 2023-01-11T22:13:59.9184422Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 896, in 2023-01-11T22:13:59.9184871Z "runTest", timeout, world_size, lambda: func(self, *args, **kwargs) 2023-01-11T22:13:59.9185296Z File "/var/lib/jenkins/workspace/test/distributed/test_multi_threaded_pg.py", line 57, in _test_method 2023-01-11T22:13:59.9185804Z raise AssertionError("Mimic real test failure.") # fail on rank 1 2023-01-11T22:13:59.9186169Z AssertionError: Mimic real test failure. 2023-01-11T22:13:59.9186435Z exiting thread 1 2023-01-11T22:13:59.9186672Z ok (0.013s) 2023-01-11T22:13:59.9187168Z test_collective_error_on_rank_non_zero_all (__main__.TestCollectivesWithWrapper) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:59.9187752Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:59.9188244Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:59.9188728Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:59.9189388Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9190054Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9190733Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9191410Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9191899Z ERROR:torch.testing._internal.common_distributed:Caught exception: 2023-01-11T22:13:59.9192245Z Traceback (most recent call last): 2023-01-11T22:13:59.9192796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/distributed/multi_threaded_pg.py", line 365, in worker 2023-01-11T22:13:59.9193181Z callback() 2023-01-11T22:13:59.9193657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 896, in 2023-01-11T22:13:59.9194144Z "runTest", timeout, world_size, lambda: func(self, *args, **kwargs) 2023-01-11T22:13:59.9194587Z File "/var/lib/jenkins/workspace/test/distributed/test_multi_threaded_pg.py", line 72, in _test_method 2023-01-11T22:13:59.9195120Z raise AssertionError("Mimic real test failure.") # fail on all non-zero rank 2023-01-11T22:13:59.9195486Z AssertionError: Mimic real test failure. 2023-01-11T22:13:59.9195751Z exiting thread 1 2023-01-11T22:13:59.9195991Z ok (0.013s) 2023-01-11T22:13:59.9196476Z test_collective_error_on_rank_zero (__main__.TestCollectivesWithWrapper) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:59.9197064Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:59.9197546Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:59.9198197Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9198722Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:59.9199340Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9200012Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9200690Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9201234Z ERROR:torch.testing._internal.common_distributed:Caught exception: 2023-01-11T22:13:59.9201589Z Traceback (most recent call last): 2023-01-11T22:13:59.9202141Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/distributed/multi_threaded_pg.py", line 365, in worker 2023-01-11T22:13:59.9202527Z callback() 2023-01-11T22:13:59.9203058Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 896, in 2023-01-11T22:13:59.9203521Z "runTest", timeout, world_size, lambda: func(self, *args, **kwargs) 2023-01-11T22:13:59.9203960Z File "/var/lib/jenkins/workspace/test/distributed/test_multi_threaded_pg.py", line 42, in _test_method 2023-01-11T22:13:59.9204379Z raise AssertionError("Mimic real test failure.") # fail on rank 0 2023-01-11T22:13:59.9204727Z AssertionError: Mimic real test failure. 2023-01-11T22:13:59.9205012Z exiting thread 0 2023-01-11T22:13:59.9205257Z ok (0.004s) 2023-01-11T22:13:59.9205695Z test_skip (__main__.TestCollectivesWithWrapper) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:13:59.9206262Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:13:59.9206747Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:13:59.9207388Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9208067Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9208597Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:13:59.9209235Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9209899Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:13:59.9210495Z INFO:torch.testing._internal.common_distributed:Thread 1 skipping test runTest for following reason: check if skip exception can be captured correctly. 2023-01-11T22:13:59.9211121Z INFO:torch.testing._internal.common_distributed:Thread 2 skipping test runTest for following reason: check if skip exception can be captured correctly. 2023-01-11T22:13:59.9211741Z INFO:torch.testing._internal.common_distributed:Thread 3 skipping test runTest for following reason: check if skip exception can be captured correctly. 2023-01-11T22:13:59.9212344Z INFO:torch.testing._internal.common_distributed:Thread 0 skipping test runTest for following reason: check if skip exception can be captured correctly. 2023-01-11T22:13:59.9212758Z ok (0.012s) 2023-01-11T22:13:59.9213045Z 2023-01-11T22:13:59.9213332Z ---------------------------------------------------------------------- 2023-01-11T22:13:59.9213671Z Ran 12 tests in 1.793s 2023-01-11T22:13:59.9213817Z 2023-01-11T22:13:59.9213911Z OK 2023-01-11T22:13:59.9214044Z 2023-01-11T22:13:59.9214168Z Generating XML reports... 2023-01-11T22:13:59.9214809Z Generated XML report: test-reports/python-unittest/distributed.test_multi_threaded_pg/TEST-TestCollectivesWithBaseClass-20230111221357.xml 2023-01-11T22:13:59.9215631Z Generated XML report: test-reports/python-unittest/distributed.test_multi_threaded_pg/TEST-TestCollectivesWithWrapper-20230111221357.xml 2023-01-11T22:13:59.9216009Z 2023-01-11T22:13:59.9216337Z ##[endgroup] 2023-01-11T22:13:59.9216936Z FINISHED PRINTING LOG FILE of distributed/test_multi_threaded_pg (/var/lib/jenkins/workspace/test/test-reports/distributed-test_multi_threaded_pg_41g0dkg9) 2023-01-11T22:13:59.9217282Z 2023-01-11T22:13:59.9217577Z Running distributed/checkpoint/test_fsdp_optim_state ... [2023-01-11 22:13:59.913392] 2023-01-11T22:13:59.9218392Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/checkpoint/test_fsdp_optim_state.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:13:59.913673] 2023-01-11T22:14:06.1820275Z 2023-01-11T22:14:06.1820985Z Expand the folded group to see the log file of distributed/checkpoint/test_fsdp_optim_state 2023-01-11T22:14:06.1822440Z ##[group]PRINTING LOG FILE of distributed/checkpoint/test_fsdp_optim_state (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_fsdp_optim_state_vk33fo2n) 2023-01-11T22:14:06.1822869Z 2023-01-11T22:14:06.1822989Z Running tests... 2023-01-11T22:14:06.1823792Z ---------------------------------------------------------------------- 2023-01-11T22:14:06.1824511Z Test results will be stored in test-reports/python-unittest/distributed.checkpoint.test_fsdp_optim_state 2023-01-11T22:14:06.1825053Z test_distributed_tensor_planner (__main__.FsdpOptimStateCheckpoint) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:14:06.1825656Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41481 2023-01-11T22:14:06.1826356Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41482 2023-01-11T22:14:06.1826994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:06.1827431Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:06.1828008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:06.1828511Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:06.1829074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:06.1829525Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:06.1830098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:06.1830567Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:06.1830985Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:06.1831475Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:14:06.1831957Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:06.1832426Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:14:06.1833077Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:06.1833765Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:06.1834202Z skip: Need at least 4 CUDA devices (3.904s) 2023-01-11T22:14:06.1834402Z 2023-01-11T22:14:06.1834655Z ---------------------------------------------------------------------- 2023-01-11T22:14:06.1834983Z Ran 1 test in 3.904s 2023-01-11T22:14:06.1835145Z 2023-01-11T22:14:06.1835256Z OK (skipped=1) 2023-01-11T22:14:06.1835411Z 2023-01-11T22:14:06.1835536Z Generating XML reports... 2023-01-11T22:14:06.1836175Z Generated XML report: test-reports/python-unittest/distributed.checkpoint.test_fsdp_optim_state/TEST-FsdpOptimStateCheckpoint-20230111221401.xml 2023-01-11T22:14:06.1836573Z 2023-01-11T22:14:06.1836915Z ##[endgroup] 2023-01-11T22:14:06.1837568Z FINISHED PRINTING LOG FILE of distributed/checkpoint/test_fsdp_optim_state (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_fsdp_optim_state_vk33fo2n) 2023-01-11T22:14:06.1837955Z 2023-01-11T22:14:06.1838219Z Running distributed/fsdp/test_fsdp_traversal ... [2023-01-11 22:14:06.182085] 2023-01-11T22:14:06.1838904Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_traversal.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:14:06.182342] 2023-01-11T22:14:13.4187895Z 2023-01-11T22:14:13.4188838Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_traversal 2023-01-11T22:14:13.4190705Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_traversal (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_traversal_q1olz4pa) 2023-01-11T22:14:13.4191152Z 2023-01-11T22:14:13.4191268Z Running tests... 2023-01-11T22:14:13.4192029Z ---------------------------------------------------------------------- 2023-01-11T22:14:13.4192638Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_traversal 2023-01-11T22:14:13.4193490Z test_fsdp_modules (__main__.TestTraversal) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:14:13.4194371Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41586 2023-01-11T22:14:13.4195189Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41587 2023-01-11T22:14:13.4196327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:13.4197180Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:13.4198317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:13.4199331Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:13.4200533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:13.4201454Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:13.4202596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:13.4203503Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:13.4204392Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:14:13.4205351Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:14:13.4206652Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:13.4208051Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:13.4209080Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:13.4209972Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:13.4210682Z dist init r=0, world=2 2023-01-11T22:14:13.4211161Z dist init r=1, world=2 2023-01-11T22:14:13.4211609Z ok (4.930s) 2023-01-11T22:14:13.4211903Z 2023-01-11T22:14:13.4212447Z ---------------------------------------------------------------------- 2023-01-11T22:14:13.4213447Z Ran 1 test in 4.930s 2023-01-11T22:14:13.4213769Z 2023-01-11T22:14:13.4213954Z OK 2023-01-11T22:14:13.4214190Z 2023-01-11T22:14:13.4214430Z Generating XML reports... 2023-01-11T22:14:13.4215641Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_traversal/TEST-TestTraversal-20230111221408.xml 2023-01-11T22:14:13.4216361Z 2023-01-11T22:14:13.4216976Z ##[endgroup] 2023-01-11T22:14:13.4218214Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_traversal (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_traversal_q1olz4pa) 2023-01-11T22:14:13.4218969Z 2023-01-11T22:14:13.4219542Z Running distributed/fsdp/test_fsdp_uneven ... [2023-01-11 22:14:13.418850] 2023-01-11T22:14:13.4220959Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_uneven.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:14:13.419105] 2023-01-11T22:14:21.0478465Z 2023-01-11T22:14:21.0479169Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_uneven 2023-01-11T22:14:21.0480123Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_uneven (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_uneven_wx1c0z74) 2023-01-11T22:14:21.0480520Z 2023-01-11T22:14:21.0480638Z Running tests... 2023-01-11T22:14:21.0481228Z ---------------------------------------------------------------------- 2023-01-11T22:14:21.0482250Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_uneven 2023-01-11T22:14:21.0482727Z test_one_iteration (__main__.TestUnevenParamShard) 2023-01-11T22:14:21.0483316Z Test FSDP with uneven divide of parameter shards. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:14:21.0483844Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41700 2023-01-11T22:14:21.0484295Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41701 2023-01-11T22:14:21.0484948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:21.0485386Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:21.0485961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:21.0486430Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:21.0486995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:21.0487442Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:21.0488010Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:21.0488471Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:21.0488907Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:14:21.0489408Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:14:21.0490065Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:21.0490755Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:21.0491256Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:21.0491725Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:21.0492081Z dist init r=0, world=2 2023-01-11T22:14:21.0492318Z dist init r=1, world=2 2023-01-11T22:14:21.0492556Z ok (5.319s) 2023-01-11T22:14:21.0492708Z 2023-01-11T22:14:21.0493212Z ---------------------------------------------------------------------- 2023-01-11T22:14:21.0493540Z Ran 1 test in 5.319s 2023-01-11T22:14:21.0493701Z 2023-01-11T22:14:21.0493796Z OK 2023-01-11T22:14:21.0493932Z 2023-01-11T22:14:21.0494057Z Generating XML reports... 2023-01-11T22:14:21.0494674Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_uneven/TEST-TestUnevenParamShard-20230111221415.xml 2023-01-11T22:14:21.0495020Z 2023-01-11T22:14:21.0495347Z ##[endgroup] 2023-01-11T22:14:21.0495949Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_uneven (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_uneven_wx1c0z74) 2023-01-11T22:14:21.0496301Z 2023-01-11T22:14:21.0496608Z Running distributed/checkpoint/test_fsdp_model_state ... [2023-01-11 22:14:21.047890] 2023-01-11T22:14:21.0497306Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/checkpoint/test_fsdp_model_state.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:14:21.048151] 2023-01-11T22:14:29.5992199Z 2023-01-11T22:14:29.5992973Z Expand the folded group to see the log file of distributed/checkpoint/test_fsdp_model_state 2023-01-11T22:14:29.5994225Z ##[group]PRINTING LOG FILE of distributed/checkpoint/test_fsdp_model_state (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_fsdp_model_state_rv318dzd) 2023-01-11T22:14:29.5994623Z 2023-01-11T22:14:29.5994738Z Running tests... 2023-01-11T22:14:29.5995258Z ---------------------------------------------------------------------- 2023-01-11T22:14:29.5995959Z Test results will be stored in test-reports/python-unittest/distributed.checkpoint.test_fsdp_model_state 2023-01-11T22:14:29.5996551Z test_fsdp_model_state_no_resharding (__main__.FsdpModelStateCheckpoint) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:14:29.5997061Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41818 2023-01-11T22:14:29.5997495Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41819 2023-01-11T22:14:29.5998138Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:29.5998608Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:29.5999189Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:29.5999646Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:29.6000229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:29.6000674Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:29.6001269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:29.6001720Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:29.6002162Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:29.6002656Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:14:29.6003143Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:29.6003605Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:14:29.6004263Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:29.6004952Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:29.6005376Z skip: Need at least 4 CUDA devices (3.889s) 2023-01-11T22:14:29.6005889Z test_fsdp_model_state_with_resharding (__main__.FsdpModelStateCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41888 2023-01-11T22:14:29.6006444Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41889 2023-01-11T22:14:29.6007059Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:29.6007492Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:29.6008065Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:29.6008532Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:29.6009107Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:29.6009532Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:29.6010105Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:29.6010569Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:29.6011070Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:29.6011562Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:14:29.6012045Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:29.6012523Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:14:29.6013538Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:29.6014264Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:29.6014706Z skip: Need at least 4 CUDA devices (2.308s) 2023-01-11T22:14:29.6014901Z 2023-01-11T22:14:29.6015171Z ---------------------------------------------------------------------- 2023-01-11T22:14:29.6015491Z Ran 2 tests in 6.197s 2023-01-11T22:14:29.6015653Z 2023-01-11T22:14:29.6015764Z OK (skipped=2) 2023-01-11T22:14:29.6015922Z 2023-01-11T22:14:29.6016051Z Generating XML reports... 2023-01-11T22:14:29.6016690Z Generated XML report: test-reports/python-unittest/distributed.checkpoint.test_fsdp_model_state/TEST-FsdpModelStateCheckpoint-20230111221422.xml 2023-01-11T22:14:29.6017086Z 2023-01-11T22:14:29.6017409Z ##[endgroup] 2023-01-11T22:14:29.6018066Z FINISHED PRINTING LOG FILE of distributed/checkpoint/test_fsdp_model_state (/var/lib/jenkins/workspace/test/test-reports/distributed-checkpoint-test_fsdp_model_state_rv318dzd) 2023-01-11T22:14:29.6018452Z 2023-01-11T22:14:29.6018743Z Running distributed/_shard/sharded_tensor/ops/test_embedding ... [2023-01-11 22:14:29.599289] 2023-01-11T22:14:29.6019479Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_embedding.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:14:29.599593] 2023-01-11T22:14:38.0631765Z 2023-01-11T22:14:38.0632498Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_embedding 2023-01-11T22:14:38.0634096Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_embedding (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_embedding_j31p6aq9) 2023-01-11T22:14:38.0634823Z 2023-01-11T22:14:38.0635121Z Running tests... 2023-01-11T22:14:38.0636094Z ---------------------------------------------------------------------- 2023-01-11T22:14:38.0637314Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_embedding 2023-01-11T22:14:38.0637874Z test_sharded_embedding_colwise (__main__.TestShardedEmbedding) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:14:38.0638371Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41993 2023-01-11T22:14:38.0639014Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41994 2023-01-11T22:14:38.0639453Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 41995 2023-01-11T22:14:38.0639891Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 41996 2023-01-11T22:14:38.0640528Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:38.0640963Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:38.0641543Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:38.0642012Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:38.0642593Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:38.0643022Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:38.0643597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:38.0644347Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:38.0644937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:38.0645362Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:38.0646033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:38.0646510Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:38.0647072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:38.0647523Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:38.0648090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:38.0648551Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:38.0648987Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:38.0649462Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:38.0649931Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:38.0650377Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:38.0650762Z skip: Need at least 4 CUDA devices (3.922s) 2023-01-11T22:14:38.0651253Z test_sharded_embedding_rowwise (__main__.TestShardedEmbedding) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42129 2023-01-11T22:14:38.0651772Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42130 2023-01-11T22:14:38.0652221Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 42131 2023-01-11T22:14:38.0652666Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 42132 2023-01-11T22:14:38.0653545Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:38.0653979Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:38.0654561Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:38.0655033Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:38.0655590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:38.0656034Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:38.0656592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:38.0657043Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:38.0657594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:38.0658063Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:38.0658648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:38.0659109Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:38.0659669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:38.0660138Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:38.0660709Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:38.0661291Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:38.0661713Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:38.0662182Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:38.0662652Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:38.0663164Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:38.0663565Z skip: Need at least 4 CUDA devices (2.311s) 2023-01-11T22:14:38.0663763Z 2023-01-11T22:14:38.0664050Z ---------------------------------------------------------------------- 2023-01-11T22:14:38.0664383Z Ran 2 tests in 6.233s 2023-01-11T22:14:38.0664527Z 2023-01-11T22:14:38.0664637Z OK (skipped=2) 2023-01-11T22:14:38.0664793Z 2023-01-11T22:14:38.0664919Z Generating XML reports... 2023-01-11T22:14:38.0665564Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_embedding/TEST-TestShardedEmbedding-20230111221431.xml 2023-01-11T22:14:38.0665950Z 2023-01-11T22:14:38.0666263Z ##[endgroup] 2023-01-11T22:14:38.0666944Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_embedding (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_embedding_j31p6aq9) 2023-01-11T22:14:38.0667344Z 2023-01-11T22:14:38.0667647Z Running distributed/_shard/sharded_tensor/ops/test_chunk ... [2023-01-11 22:14:38.063186] 2023-01-11T22:14:38.0668366Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_chunk.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:14:38.063555] 2023-01-11T22:14:46.6808337Z 2023-01-11T22:14:46.6808840Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_chunk 2023-01-11T22:14:46.6810079Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_chunk (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_chunk_yefgpxfp) 2023-01-11T22:14:46.6810515Z 2023-01-11T22:14:46.6810629Z Running tests... 2023-01-11T22:14:46.6811127Z ---------------------------------------------------------------------- 2023-01-11T22:14:46.6811908Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_chunk 2023-01-11T22:14:46.6812543Z test_sharded_chunk (__main__.TestShardedTensorChunkOps) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:14:46.6813519Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42300 2023-01-11T22:14:46.6813955Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42301 2023-01-11T22:14:46.6814392Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 42302 2023-01-11T22:14:46.6815067Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 42303 2023-01-11T22:14:46.6815693Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:46.6816145Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:46.6816758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:46.6817230Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:46.6817814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:46.6818246Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:46.6818816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:46.6819284Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:46.6819841Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:46.6820556Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:46.6821136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:46.6821597Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:46.6822255Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:46.6822720Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:46.6823297Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:46.6823762Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:46.6824180Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:46.6824659Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:46.6825123Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:46.6825569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:46.6825958Z skip: Need at least 4 CUDA devices (3.930s) 2023-01-11T22:14:46.6826455Z test_sharded_chunk_error (__main__.TestShardedTensorChunkOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42436 2023-01-11T22:14:46.6826998Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42437 2023-01-11T22:14:46.6827425Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 42438 2023-01-11T22:14:46.6827862Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 42439 2023-01-11T22:14:46.6828473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:46.6828912Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:46.6829482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:46.6829950Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:46.6830567Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:46.6830991Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:46.6831558Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:46.6832021Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:46.6832575Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:46.6833021Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:46.6833587Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:46.6834044Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:46.6834600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:46.6835043Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:46.6835618Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:46.6836073Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:46.6836490Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:46.6837039Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:14:46.6837504Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:46.6837953Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:14:46.6838337Z skip: Need at least 4 CUDA devices (2.411s) 2023-01-11T22:14:46.6838530Z 2023-01-11T22:14:46.6838869Z ---------------------------------------------------------------------- 2023-01-11T22:14:46.6839211Z Ran 2 tests in 6.341s 2023-01-11T22:14:46.6839357Z 2023-01-11T22:14:46.6839469Z OK (skipped=2) 2023-01-11T22:14:46.6839623Z 2023-01-11T22:14:46.6839747Z Generating XML reports... 2023-01-11T22:14:46.6840411Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_chunk/TEST-TestShardedTensorChunkOps-20230111221439.xml 2023-01-11T22:14:46.6840802Z 2023-01-11T22:14:46.6841109Z ##[endgroup] 2023-01-11T22:14:46.6841761Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_chunk (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_chunk_yefgpxfp) 2023-01-11T22:14:46.6842151Z 2023-01-11T22:14:46.6842421Z Running distributed/test_c10d_error_logger ... [2023-01-11 22:14:46.680937] 2023-01-11T22:14:46.6843104Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_c10d_error_logger.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:14:46.681192] 2023-01-11T22:14:56.1118535Z 2023-01-11T22:14:56.1119069Z Expand the folded group to see the log file of distributed/test_c10d_error_logger 2023-01-11T22:14:56.1120145Z ##[group]PRINTING LOG FILE of distributed/test_c10d_error_logger (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_error_logger_3xb8ryv8) 2023-01-11T22:14:56.1120692Z 2023-01-11T22:14:56.1120906Z Running tests... 2023-01-11T22:14:56.1121480Z ---------------------------------------------------------------------- 2023-01-11T22:14:56.1122045Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_error_logger 2023-01-11T22:14:56.1122602Z test_exception_handler_with_dist (__main__.C10dErrorLoggerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:14:56.1123088Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42607 2023-01-11T22:14:56.1123532Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42608 2023-01-11T22:14:56.1124140Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:56.1124591Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:56.1125168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:56.1125623Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:56.1126202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:56.1126653Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:56.1127224Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:56.1127671Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:56.1128113Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:56.1128600Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:14:56.1129080Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:56.1129543Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:14:56.1130195Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:56.1131212Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:14:56.1131587Z ok (4.820s) 2023-01-11T22:14:56.1132022Z test_get_or_create_logger (__main__.C10dErrorLoggerTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42686 2023-01-11T22:14:56.1132647Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42687 2023-01-11T22:14:56.1133759Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:56.1134201Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:56.1134779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:56.1135246Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:56.1135814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:14:56.1136255Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:14:56.1136823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:14:56.1137282Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:14:56.1137706Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:14:56.1138174Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:14:56.1138517Z ok (2.308s) 2023-01-11T22:14:56.1138666Z 2023-01-11T22:14:56.1138938Z ---------------------------------------------------------------------- 2023-01-11T22:14:56.1139251Z Ran 2 tests in 7.129s 2023-01-11T22:14:56.1139415Z 2023-01-11T22:14:56.1139507Z OK 2023-01-11T22:14:56.1139643Z 2023-01-11T22:14:56.1139768Z Generating XML reports... 2023-01-11T22:14:56.1140348Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_error_logger/TEST-C10dErrorLoggerTest-20230111221448.xml 2023-01-11T22:14:56.1140695Z 2023-01-11T22:14:56.1141015Z ##[endgroup] 2023-01-11T22:14:56.1141606Z FINISHED PRINTING LOG FILE of distributed/test_c10d_error_logger (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_error_logger_3xb8ryv8) 2023-01-11T22:14:56.1141956Z 2023-01-11T22:14:56.1142237Z Running distributed/_shard/sharded_tensor/ops/test_init ... [2023-01-11 22:14:56.111905] 2023-01-11T22:14:56.1142955Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_init.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:14:56.112169] 2023-01-11T22:15:07.0439808Z 2023-01-11T22:15:07.0440595Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_init 2023-01-11T22:15:07.0442331Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_init (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_init_zz7srn54) 2023-01-11T22:15:07.0442730Z 2023-01-11T22:15:07.0442825Z Running tests... 2023-01-11T22:15:07.0443354Z ---------------------------------------------------------------------- 2023-01-11T22:15:07.0443948Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_init 2023-01-11T22:15:07.0444893Z test_init_sharded_tensor_with_kaiming_uniform (__main__.TestShardedTensorNNInit) 2023-01-11T22:15:07.0445992Z Test torch.nn.init.kaiming_uniform_(ShardedTensor, a, mode, nonlinearit) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:15:07.0447056Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42789 2023-01-11T22:15:07.0447521Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42790 2023-01-11T22:15:07.0448247Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 42791 2023-01-11T22:15:07.0449373Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 42792 2023-01-11T22:15:07.0450672Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:07.0451616Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:07.0453433Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:07.0453991Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:07.0454646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:07.0455101Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:07.0455661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:07.0456142Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:07.0456722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:07.0457171Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:07.0457734Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:07.0458198Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:07.0458777Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:07.0459223Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:07.0459777Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:07.0460245Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:07.0460685Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:07.0461140Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:07.0461599Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:07.0462073Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:07.0462467Z skip: Need at least 4 CUDA devices (4.042s) 2023-01-11T22:15:07.0462830Z test_init_sharded_tensor_with_normal (__main__.TestShardedTensorNNInit) 2023-01-11T22:15:07.0463357Z Test torch.nn.init.normal_(ShardedTensor, mean, std) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42925 2023-01-11T22:15:07.0463882Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42926 2023-01-11T22:15:07.0464313Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 42927 2023-01-11T22:15:07.0464757Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 42928 2023-01-11T22:15:07.0465366Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:07.0465819Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:07.0466380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:07.0466849Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:07.0479100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:07.0479642Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:07.0480250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:07.0480909Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:07.0481504Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:07.0481947Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:07.0482565Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:07.0483043Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:07.0483628Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:07.0484053Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:07.0484634Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:07.0485101Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:07.0485538Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:07.0485996Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:07.0486456Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:07.0486927Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:07.0487303Z skip: Need at least 4 CUDA devices (2.311s) 2023-01-11T22:15:07.0487687Z test_init_sharded_tensor_with_uniform (__main__.TestShardedTensorNNInit) 2023-01-11T22:15:07.0488208Z Test torch.nn.init.uniform_(ShardedTensor, a, b) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43061 2023-01-11T22:15:07.0488722Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43062 2023-01-11T22:15:07.0489152Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 43063 2023-01-11T22:15:07.0489594Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 43064 2023-01-11T22:15:07.0490211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:07.0490641Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:07.0491219Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:07.0491687Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:07.0492261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:07.0492684Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:07.0493565Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:07.0494020Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:07.0494594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:07.0495049Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:07.0495644Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:07.0496111Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:07.0496667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:07.0497108Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:07.0497674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:07.0498249Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:07.0498665Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:07.0499137Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:07.0499602Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:07.0500126Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:07.0500534Z skip: Need at least 4 CUDA devices (2.311s) 2023-01-11T22:15:07.0500730Z 2023-01-11T22:15:07.0501019Z ---------------------------------------------------------------------- 2023-01-11T22:15:07.0501350Z Ran 3 tests in 8.666s 2023-01-11T22:15:07.0501498Z 2023-01-11T22:15:07.0501610Z OK (skipped=3) 2023-01-11T22:15:07.0501766Z 2023-01-11T22:15:07.0501892Z Generating XML reports... 2023-01-11T22:15:07.0502550Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_init/TEST-TestShardedTensorNNInit-20230111221457.xml 2023-01-11T22:15:07.0502936Z 2023-01-11T22:15:07.0503265Z ##[endgroup] 2023-01-11T22:15:07.0503918Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_init (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_init_zz7srn54) 2023-01-11T22:15:07.0504303Z 2023-01-11T22:15:07.0504585Z Running distributed/fsdp/test_fsdp_pure_fp16 ... [2023-01-11 22:15:07.044041] 2023-01-11T22:15:07.0505274Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_pure_fp16.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:15:07.044296] 2023-01-11T22:15:18.6376298Z 2023-01-11T22:15:18.6377033Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_pure_fp16 2023-01-11T22:15:18.6378105Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_pure_fp16 (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_pure_fp16_ua0ik34k) 2023-01-11T22:15:18.6378557Z 2023-01-11T22:15:18.6378679Z Running tests... 2023-01-11T22:15:18.6379187Z ---------------------------------------------------------------------- 2023-01-11T22:15:18.6379767Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_pure_fp16 2023-01-11T22:15:18.6380621Z test_pure_fp16_cpu_offload_CPUOffload(offload_params=False) (__main__.TestPureFP16) 2023-01-11T22:15:18.6381252Z Tests pure FP16 training, including when the parameter's dtype is ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:15:18.6381750Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43232 2023-01-11T22:15:18.6382204Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43233 2023-01-11T22:15:18.6383198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:18.6383646Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:18.6384232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:18.6384707Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:18.6385287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:18.6385722Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:18.6386298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:18.6386763Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:18.6387201Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:15:18.6387695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:15:18.6388649Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:18.6389343Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:18.6389849Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:18.6390421Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:18.6391706Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:15:18.6392483Z warnings.warn( 2023-01-11T22:15:18.6393643Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:15:18.6394406Z warnings.warn( 2023-01-11T22:15:18.6394640Z dist init r=0, world=2 2023-01-11T22:15:18.6394894Z dist init r=1, world=2 2023-01-11T22:15:18.6395135Z ok (5.447s) 2023-01-11T22:15:18.6395466Z test_pure_fp16_cpu_offload_CPUOffload(offload_params=True) (__main__.TestPureFP16) 2023-01-11T22:15:18.6396150Z Tests pure FP16 training, including when the parameter's dtype is ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43315 2023-01-11T22:15:18.6396688Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43316 2023-01-11T22:15:18.6397277Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:18.6397725Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:18.6398307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:18.6398776Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:18.6399336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:18.6399786Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:18.6400358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:18.6400831Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:18.6401266Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:15:18.6401762Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:15:18.6402424Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:18.6403096Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:18.6403617Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:18.6404086Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:18.6405350Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:15:18.6406196Z warnings.warn( 2023-01-11T22:15:18.6407369Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:15:18.6408153Z warnings.warn( 2023-01-11T22:15:18.6408429Z File "", line 1, in 2023-01-11T22:15:18.6408803Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:15:18.6409160Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:15:18.6409532Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:15:18.6409906Z return self._bootstrap(parent_sentinel) 2023-01-11T22:15:18.6410284Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:15:18.6410625Z self.run() 2023-01-11T22:15:18.6410961Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:15:18.6411329Z self._target(*self._args, **self._kwargs) 2023-01-11T22:15:18.6411836Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:15:18.6412230Z self.run_test(test_name, pipe) 2023-01-11T22:15:18.6412755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:15:18.6413385Z getattr(self, test_name)() 2023-01-11T22:15:18.6413911Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:15:18.6414285Z fn() 2023-01-11T22:15:18.6414758Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:15:18.6415160Z test(self, **param_kwargs) 2023-01-11T22:15:18.6415673Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:15:18.6416064Z return func(*args, **kwargs) 2023-01-11T22:15:18.6416442Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_pure_fp16.py", line 47, in test_pure_fp16 2023-01-11T22:15:18.6416815Z self._test_fsdp_parity( 2023-01-11T22:15:18.6417335Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:15:18.6417743Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:15:18.6418304Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:15:18.6418702Z output = model(*input) 2023-01-11T22:15:18.6419184Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:15:18.6419554Z return forward_call(*args, **kwargs) 2023-01-11T22:15:18.6420099Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:15:18.6420554Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:15:18.6421099Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:15:18.6421496Z _lazy_init(state, module) 2023-01-11T22:15:18.6422132Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:15:18.6422572Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:15:18.6423142Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:15:18.6423573Z handle.init_flat_param_attributes() 2023-01-11T22:15:18.6424171Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:15:18.6424552Z return func(*args, **kwargs) 2023-01-11T22:15:18.6425098Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:15:18.6425484Z p_assert( 2023-01-11T22:15:18.6425957Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:15:18.6426319Z traceback.print_stack() 2023-01-11T22:15:18.6426611Z File "", line 1, in 2023-01-11T22:15:18.6426983Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:15:18.6427337Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:15:18.6427715Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:15:18.6428087Z return self._bootstrap(parent_sentinel) 2023-01-11T22:15:18.6428461Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:15:18.6428797Z self.run() 2023-01-11T22:15:18.6429133Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:15:18.6429508Z self._target(*self._args, **self._kwargs) 2023-01-11T22:15:18.6430006Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:15:18.6430399Z self.run_test(test_name, pipe) 2023-01-11T22:15:18.6430926Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:15:18.6431308Z getattr(self, test_name)() 2023-01-11T22:15:18.6431856Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:15:18.6432227Z fn() 2023-01-11T22:15:18.6432755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:15:18.6433137Z test(self, **param_kwargs) 2023-01-11T22:15:18.6433654Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:15:18.6434044Z return func(*args, **kwargs) 2023-01-11T22:15:18.6434424Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_pure_fp16.py", line 47, in test_pure_fp16 2023-01-11T22:15:18.6434798Z self._test_fsdp_parity( 2023-01-11T22:15:18.6435324Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:15:18.6435749Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:15:18.6436287Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:15:18.6436682Z output = model(*input) 2023-01-11T22:15:18.6437165Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:15:18.6437535Z return forward_call(*args, **kwargs) 2023-01-11T22:15:18.6438080Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:15:18.6438532Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:15:18.6439095Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:15:18.6439472Z _lazy_init(state, module) 2023-01-11T22:15:18.6440064Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:15:18.6440497Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:15:18.6441067Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:15:18.6441507Z handle.init_flat_param_attributes() 2023-01-11T22:15:18.6442077Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:15:18.6442464Z return func(*args, **kwargs) 2023-01-11T22:15:18.6442988Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:15:18.6443373Z p_assert( 2023-01-11T22:15:18.6443843Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:15:18.6444212Z traceback.print_stack() 2023-01-11T22:15:18.6444481Z dist init r=0, world=2 2023-01-11T22:15:18.6444954Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:15:18.6445401Z dist init r=1, world=2 2023-01-11T22:15:18.6445853Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:15:18.6446277Z ok (3.812s) 2023-01-11T22:15:18.6446428Z 2023-01-11T22:15:18.6446704Z ---------------------------------------------------------------------- 2023-01-11T22:15:18.6447019Z Ran 2 tests in 9.259s 2023-01-11T22:15:18.6447185Z 2023-01-11T22:15:18.6447281Z OK 2023-01-11T22:15:18.6447417Z 2023-01-11T22:15:18.6447544Z Generating XML reports... 2023-01-11T22:15:18.6448130Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_pure_fp16/TEST-TestPureFP16-20230111221508.xml 2023-01-11T22:15:18.6448456Z 2023-01-11T22:15:18.6448791Z ##[endgroup] 2023-01-11T22:15:18.6449405Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_pure_fp16 (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_pure_fp16_ua0ik34k) 2023-01-11T22:15:18.6449762Z 2023-01-11T22:15:18.6450072Z Running distributed/_shard/sharded_tensor/ops/test_binary_cmp ... [2023-01-11 22:15:18.637774] 2023-01-11T22:15:18.6450785Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_binary_cmp.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:15:18.638065] 2023-01-11T22:15:32.0777915Z 2023-01-11T22:15:32.0778423Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_binary_cmp 2023-01-11T22:15:32.0779493Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_binary_cmp (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_binary_cmp_2k7xuiah) 2023-01-11T22:15:32.0779945Z 2023-01-11T22:15:32.0780060Z Running tests... 2023-01-11T22:15:32.0780572Z ---------------------------------------------------------------------- 2023-01-11T22:15:32.0781149Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_binary_cmp 2023-01-11T22:15:32.0781639Z test_torch_allclose (__main__.TestShardedTensorBinaryOps) 2023-01-11T22:15:32.0782103Z Test torch.allclose(ShardedTensor, ShardedTensor) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:15:32.0782582Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43433 2023-01-11T22:15:32.0783020Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43434 2023-01-11T22:15:32.0783470Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 43435 2023-01-11T22:15:32.0783905Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 43436 2023-01-11T22:15:32.0784791Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0785247Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0785862Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0786339Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0787020Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0787464Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0788049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0788517Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0789072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0789537Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0790110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0790587Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0791150Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0791597Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0792169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0792633Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0793054Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:32.0793531Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:32.0793998Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:32.0794442Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:32.0794824Z skip: Need at least 4 CUDA devices (4.031s) 2023-01-11T22:15:32.0795335Z test_torch_allclose_tensor_specs (__main__.TestShardedTensorBinaryOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43569 2023-01-11T22:15:32.0795885Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43570 2023-01-11T22:15:32.0796312Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 43571 2023-01-11T22:15:32.0796743Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 43572 2023-01-11T22:15:32.0797354Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0797791Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0798363Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0798827Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0799406Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0799832Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0800405Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0800864Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0801420Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0801940Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0802517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0802977Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0803591Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0804049Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0804621Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0805079Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0805496Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:32.0805963Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:32.0806429Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:32.0806872Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:32.0807264Z skip: Need at least 4 CUDA devices (2.412s) 2023-01-11T22:15:32.0807622Z test_torch_equal (__main__.TestShardedTensorBinaryOps) 2023-01-11T22:15:32.0808126Z Test torch.equal(ShardedTensor, ShardedTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43705 2023-01-11T22:15:32.0808626Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43706 2023-01-11T22:15:32.0809068Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 43707 2023-01-11T22:15:32.0809509Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 43708 2023-01-11T22:15:32.0810101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0810591Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0811169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0811635Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0812196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0812640Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0814086Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0814977Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0816077Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0816832Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0817413Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0817880Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0818437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0818881Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0819451Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0819893Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0820326Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:32.0820793Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:32.0821398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:32.0821840Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:32.0822226Z skip: Need at least 4 CUDA devices (2.311s) 2023-01-11T22:15:32.0822795Z test_torch_equal_tensor_specs (__main__.TestShardedTensorBinaryOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43841 2023-01-11T22:15:32.0823343Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43842 2023-01-11T22:15:32.0823785Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 43843 2023-01-11T22:15:32.0824217Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 43844 2023-01-11T22:15:32.0824833Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0825270Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0825841Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0826301Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0826873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0827295Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0827847Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0828288Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0828842Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0829307Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0829887Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0830342Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0830897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:32.0831347Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:32.0831916Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:32.0832358Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:32.0832832Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:15:32.0833302Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:32.0833770Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:15:32.0834215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:32.0834599Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:15:32.0834791Z 2023-01-11T22:15:32.0835067Z ---------------------------------------------------------------------- 2023-01-11T22:15:32.0835385Z Ran 4 tests in 11.165s 2023-01-11T22:15:32.0835549Z 2023-01-11T22:15:32.0835659Z OK (skipped=4) 2023-01-11T22:15:32.0835815Z 2023-01-11T22:15:32.0835939Z Generating XML reports... 2023-01-11T22:15:32.0836607Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_binary_cmp/TEST-TestShardedTensorBinaryOps-20230111221520.xml 2023-01-11T22:15:32.0837006Z 2023-01-11T22:15:32.0837315Z ##[endgroup] 2023-01-11T22:15:32.0837988Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_binary_cmp (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_binary_cmp_2k7xuiah) 2023-01-11T22:15:32.0838466Z 2023-01-11T22:15:32.0838769Z Running distributed/tensor/parallel/test_2d_parallel ... [2023-01-11 22:15:32.077910] 2023-01-11T22:15:32.0839471Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/tensor/parallel/test_2d_parallel.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:15:32.078190] 2023-01-11T22:15:47.5852613Z 2023-01-11T22:15:47.5853671Z Expand the folded group to see the log file of distributed/tensor/parallel/test_2d_parallel 2023-01-11T22:15:47.5855769Z ##[group]PRINTING LOG FILE of distributed/tensor/parallel/test_2d_parallel (/var/lib/jenkins/workspace/test/test-reports/distributed-tensor-parallel-test_2d_parallel_qm3q1chs) 2023-01-11T22:15:47.5856668Z 2023-01-11T22:15:47.5856899Z Running tests... 2023-01-11T22:15:47.5857542Z ---------------------------------------------------------------------- 2023-01-11T22:15:47.5858168Z Test results will be stored in test-reports/python-unittest/distributed.tensor.parallel.test_2d_parallel 2023-01-11T22:15:47.5858747Z test_2d_fsdp_integration_correctness (__main__.Test2dParallelIntegration) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:15:47.5859261Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44012 2023-01-11T22:15:47.5859694Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44013 2023-01-11T22:15:47.5860313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:47.5860769Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:47.5861330Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:47.5861807Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:47.5862389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:47.5862847Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:47.5863400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:47.5863867Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:47.5864313Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:47.5864793Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:15:47.5865280Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:47.5865765Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:15:47.5866427Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:47.5867107Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:47.5867552Z skip: Need at least 4 CUDA devices (3.922s) 2023-01-11T22:15:47.5868061Z test_2d_fsdp_integration_fsdp_nested (__main__.Test2dParallelIntegration) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44082 2023-01-11T22:15:47.5868621Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44083 2023-01-11T22:15:47.5869213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:47.5869667Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:47.5870252Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:47.5870868Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:47.5871462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:47.5871910Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:47.5872487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:47.5873024Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:47.5873481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:47.5873974Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:15:47.5874460Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:47.5874923Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:15:47.5875598Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:47.5876287Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:47.5876711Z skip: Need at least 4 CUDA devices (2.309s) 2023-01-11T22:15:47.5877239Z test_2d_fsdp_integration_fsdp_nested_param_groups (__main__.Test2dParallelIntegration) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44152 2023-01-11T22:15:47.5877810Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44153 2023-01-11T22:15:47.5878622Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:47.5879064Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:47.5879651Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:47.5880128Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:47.5880709Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:47.5881135Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:47.5881712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:47.5882180Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:47.5882602Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:47.5883098Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:15:47.5883585Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:47.5884076Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:15:47.5884774Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:47.5885447Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:47.5885896Z skip: Need at least 4 CUDA devices (2.309s) 2023-01-11T22:15:47.5886412Z test_2d_fsdp_integration_functionality (__main__.Test2dParallelIntegration) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44222 2023-01-11T22:15:47.5886952Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44223 2023-01-11T22:15:47.5887565Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:47.5888017Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:47.5888692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:47.5889147Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:47.5889728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:47.5890234Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:47.5890820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:47.5891268Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:47.5891709Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:47.5892202Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:15:47.5892673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:47.5893318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:15:47.5893991Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:47.5894685Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:47.5895109Z skip: Need at least 4 CUDA devices (2.308s) 2023-01-11T22:15:47.5895623Z test_2d_fsdp_integration_use_orig_params (__main__.Test2dParallelIntegration) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44292 2023-01-11T22:15:47.5896185Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44293 2023-01-11T22:15:47.5896797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:47.5897234Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:47.5897799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:15:47.5898249Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:15:47.5898809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:47.5899282Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:47.5899869Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:15:47.5900343Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:15:47.5900761Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:15:47.5901258Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:15:47.5901741Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:15:47.5902204Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:15:47.5902863Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:47.5903551Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:15:47.5903995Z skip: Need at least 4 CUDA devices (2.308s) 2023-01-11T22:15:47.5904193Z 2023-01-11T22:15:47.5904447Z ---------------------------------------------------------------------- 2023-01-11T22:15:47.5904785Z Ran 5 tests in 13.156s 2023-01-11T22:15:47.5904953Z 2023-01-11T22:15:47.5905065Z OK (skipped=5) 2023-01-11T22:15:47.5905333Z 2023-01-11T22:15:47.5905461Z Generating XML reports... 2023-01-11T22:15:47.5906117Z Generated XML report: test-reports/python-unittest/distributed.tensor.parallel.test_2d_parallel/TEST-Test2dParallelIntegration-20230111221534.xml 2023-01-11T22:15:47.5906523Z 2023-01-11T22:15:47.5906855Z ##[endgroup] 2023-01-11T22:15:47.5907595Z FINISHED PRINTING LOG FILE of distributed/tensor/parallel/test_2d_parallel (/var/lib/jenkins/workspace/test/test-reports/distributed-tensor-parallel-test_2d_parallel_qm3q1chs) 2023-01-11T22:15:47.5907987Z 2023-01-11T22:15:47.5908309Z Running distributed/_shard/sharded_tensor/ops/test_tensor_ops ... [2023-01-11 22:15:47.585328] 2023-01-11T22:15:47.5909050Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_tensor_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:15:47.585586] 2023-01-11T22:16:03.1555034Z 2023-01-11T22:16:03.1555748Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_tensor_ops 2023-01-11T22:16:03.1556812Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_tensor_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_tensor_ops_gbu23pl1) 2023-01-11T22:16:03.1557319Z 2023-01-11T22:16:03.1557514Z Running tests... 2023-01-11T22:16:03.1558245Z ---------------------------------------------------------------------- 2023-01-11T22:16:03.1558869Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_tensor_ops 2023-01-11T22:16:03.1559353Z test_clone (__main__.TestTensorOps) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:16:03.1559804Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44397 2023-01-11T22:16:03.1560487Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44398 2023-01-11T22:16:03.1560941Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 44399 2023-01-11T22:16:03.1561387Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 44400 2023-01-11T22:16:03.1562023Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1562478Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1563037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1563511Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1564091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1564517Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1565090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1565554Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1566134Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1566555Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1567126Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1567590Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1568169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1568596Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1569161Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1569621Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1570344Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:03.1570848Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:16:03.1571340Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:03.1571835Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:16:03.1572337Z skip: Need at least 4 CUDA devices (3.944s) 2023-01-11T22:16:03.1572837Z test_deep_copy (__main__.TestTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44533 2023-01-11T22:16:03.1573633Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44534 2023-01-11T22:16:03.1574092Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 44535 2023-01-11T22:16:03.1574570Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 44536 2023-01-11T22:16:03.1575241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1575720Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1576307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1576804Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1577423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1577885Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1578494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1578988Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1579600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1580063Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1580671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1581168Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1581780Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1582236Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1582842Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1583337Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1583783Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:03.1584286Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:16:03.1584786Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:16:03.1585286Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:03.1585685Z skip: Need at least 4 CUDA devices (2.310s) 2023-01-11T22:16:03.1586159Z test_detach (__main__.TestTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44669 2023-01-11T22:16:03.1586691Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44670 2023-01-11T22:16:03.1587153Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 44671 2023-01-11T22:16:03.1587623Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 44672 2023-01-11T22:16:03.1588271Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1588860Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1589459Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1589956Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1590656Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1591143Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1591742Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1592241Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1592851Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1593313Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1593920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1594413Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1595025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1595486Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1596093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1596591Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1597033Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:03.1597529Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:16:03.1598036Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:16:03.1598533Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:03.1598929Z skip: Need at least 4 CUDA devices (2.310s) 2023-01-11T22:16:03.1599415Z test_inplace_copy (__main__.TestTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44805 2023-01-11T22:16:03.1599953Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44806 2023-01-11T22:16:03.1600411Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 44807 2023-01-11T22:16:03.1600885Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 44808 2023-01-11T22:16:03.1601527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1602010Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1602601Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1603095Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1603747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1604227Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1604816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1605314Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1605923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1606400Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1607094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1607588Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1608197Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1608716Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1609345Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1609840Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1610300Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:03.1610781Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:16:03.1611279Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:03.1611779Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:16:03.1612172Z skip: Need at least 4 CUDA devices (2.411s) 2023-01-11T22:16:03.1612660Z test_set_requires_grad (__main__.TestTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44941 2023-01-11T22:16:03.1613468Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44942 2023-01-11T22:16:03.1613952Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 44943 2023-01-11T22:16:03.1614410Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 44944 2023-01-11T22:16:03.1615065Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1615548Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1616161Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1616639Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1617251Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1617728Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1618323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1618821Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1619435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1619910Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1620499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1621001Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1621613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:03.1622069Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:03.1622672Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:03.1623171Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:03.1623635Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:16:03.1624118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:03.1624612Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:03.1625220Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:16:03.1625634Z skip: Need at least 4 CUDA devices (2.311s) 2023-01-11T22:16:03.1625822Z 2023-01-11T22:16:03.1626120Z ---------------------------------------------------------------------- 2023-01-11T22:16:03.1626469Z Ran 5 tests in 13.287s 2023-01-11T22:16:03.1626641Z 2023-01-11T22:16:03.1626753Z OK (skipped=5) 2023-01-11T22:16:03.1626993Z 2023-01-11T22:16:03.1627115Z Generating XML reports... 2023-01-11T22:16:03.1627779Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_tensor_ops/TEST-TestTensorOps-20230111221549.xml 2023-01-11T22:16:03.1628167Z 2023-01-11T22:16:03.1628504Z ##[endgroup] 2023-01-11T22:16:03.1629203Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_tensor_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_tensor_ops_gbu23pl1) 2023-01-11T22:16:03.1629644Z 2023-01-11T22:16:03.1629929Z Running distributed/fsdp/test_fsdp_memory ... [2023-01-11 22:16:03.155635] 2023-01-11T22:16:03.1630645Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_memory.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:16:03.155947] 2023-01-11T22:16:19.7406717Z 2023-01-11T22:16:19.7407444Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_memory 2023-01-11T22:16:19.7408520Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_memory (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_memory_rh34cpse) 2023-01-11T22:16:19.7408916Z 2023-01-11T22:16:19.7409032Z Running tests... 2023-01-11T22:16:19.7409557Z ---------------------------------------------------------------------- 2023-01-11T22:16:19.7410099Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_memory 2023-01-11T22:16:19.7410605Z test_fsdp_memory_ckpt_ckpt (__main__.TestFSDPMemory) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:16:19.7411090Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45112 2023-01-11T22:16:19.7411532Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45113 2023-01-11T22:16:19.7412160Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:19.7412625Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:19.7413602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:19.7414064Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:19.7414648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:19.7415094Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:19.7415684Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:19.7416133Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:19.7416593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:16:19.7417088Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:16:19.7417732Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:19.7418431Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:19.7418950Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:19.7419421Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:19.7421119Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:16:19.7421923Z warnings.warn( 2023-01-11T22:16:19.7423071Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:16:19.7423848Z warnings.warn( 2023-01-11T22:16:19.7424102Z dist init r=0, world=2 2023-01-11T22:16:19.7424359Z dist init r=1, world=2 2023-01-11T22:16:19.7424584Z ok (8.040s) 2023-01-11T22:16:19.7425014Z test_fsdp_memory_ckpt_no_ckpt (__main__.TestFSDPMemory) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45225 2023-01-11T22:16:19.7425539Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45226 2023-01-11T22:16:19.7426137Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:19.7426595Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:19.7427172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:19.7427641Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:19.7428201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:19.7428648Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:19.7429220Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:19.7429684Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:19.7430124Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:16:19.7430620Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:16:19.7431277Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:19.7431947Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:19.7432475Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:19.7432947Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:19.7434209Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:16:19.7435022Z warnings.warn( 2023-01-11T22:16:19.7436147Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:16:19.7437004Z warnings.warn( 2023-01-11T22:16:19.7437259Z dist init r=1, world=2 2023-01-11T22:16:19.7437509Z dist init r=0, world=2 2023-01-11T22:16:19.7437732Z ok (6.218s) 2023-01-11T22:16:19.7437884Z 2023-01-11T22:16:19.7438226Z ---------------------------------------------------------------------- 2023-01-11T22:16:19.7438570Z Ran 2 tests in 14.258s 2023-01-11T22:16:19.7438734Z 2023-01-11T22:16:19.7438810Z OK 2023-01-11T22:16:19.7438945Z 2023-01-11T22:16:19.7439073Z Generating XML reports... 2023-01-11T22:16:19.7439675Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_memory/TEST-TestFSDPMemory-20230111221605.xml 2023-01-11T22:16:19.7440019Z 2023-01-11T22:16:19.7440319Z ##[endgroup] 2023-01-11T22:16:19.7440922Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_memory (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_memory_rh34cpse) 2023-01-11T22:16:19.7441285Z 2023-01-11T22:16:19.7441572Z Running distributed/test_c10d_object_collectives ... [2023-01-11 22:16:19.740771] 2023-01-11T22:16:19.7442272Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_c10d_object_collectives.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:16:19.741028] 2023-01-11T22:16:36.2462022Z 2023-01-11T22:16:36.2462656Z Expand the folded group to see the log file of distributed/test_c10d_object_collectives 2023-01-11T22:16:36.2463777Z ##[group]PRINTING LOG FILE of distributed/test_c10d_object_collectives (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_object_collectives_bbs7aqho) 2023-01-11T22:16:36.2464155Z 2023-01-11T22:16:36.2464274Z Running tests... 2023-01-11T22:16:36.2465050Z ---------------------------------------------------------------------- 2023-01-11T22:16:36.2465663Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_object_collectives 2023-01-11T22:16:36.2466162Z test_all_gather_object (__main__.TestObjectCollectives) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:16:36.2466884Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45373 2023-01-11T22:16:36.2467344Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45374 2023-01-11T22:16:36.2467982Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:36.2468657Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:36.2469252Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:36.2469732Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:36.2470520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:36.2470979Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:36.2471563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:36.2472252Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:36.2472681Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:36.2473176Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:16:36.2473867Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:36.2474363Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:16:36.2475007Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:36.2476167Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:36.2476574Z ok (4.713s) 2023-01-11T22:16:36.2477017Z test_broadcast_object_list (__main__.TestObjectCollectives) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45452 2023-01-11T22:16:36.2477863Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45453 2023-01-11T22:16:36.2478520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:36.2479156Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:36.2479799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:36.2480265Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:36.2481038Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:36.2481515Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:36.2482077Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:36.2482624Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:36.2483220Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:36.2483698Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:16:36.2484184Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:36.2484857Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:16:36.2485528Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:36.2486378Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:36.2486804Z ok (3.110s) 2023-01-11T22:16:36.2487239Z test_gather_object (__main__.TestObjectCollectives) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45531 2023-01-11T22:16:36.2487749Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45532 2023-01-11T22:16:36.2488589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:36.2489045Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:36.2489640Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:36.2490325Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:36.2490905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:36.2491357Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:36.2492138Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:36.2492596Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:36.2493456Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:36.2494008Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:16:36.2494499Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:36.2494981Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:16:36.2495957Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:36.2496648Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:36.2497259Z ok (3.210s) 2023-01-11T22:16:36.2497789Z test_scatter_object_list (__main__.TestObjectCollectives) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45612 2023-01-11T22:16:36.2498336Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45613 2023-01-11T22:16:36.2499170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:36.2499608Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:36.2500189Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:36.2500878Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:36.2501474Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:36.2501901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:36.2502674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:36.2503154Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:36.2503577Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:36.2504251Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:16:36.2504749Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:36.2505236Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:16:36.2506074Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:36.2506777Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:16:36.2507181Z ok (3.210s) 2023-01-11T22:16:36.2507335Z 2023-01-11T22:16:36.2507801Z ---------------------------------------------------------------------- 2023-01-11T22:16:36.2508127Z Ran 4 tests in 14.245s 2023-01-11T22:16:36.2508291Z 2023-01-11T22:16:36.2508386Z OK 2023-01-11T22:16:36.2508521Z 2023-01-11T22:16:36.2508647Z Generating XML reports... 2023-01-11T22:16:36.2509362Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_object_collectives/TEST-TestObjectCollectives-20230111221621.xml 2023-01-11T22:16:36.2509811Z 2023-01-11T22:16:36.2510156Z ##[endgroup] 2023-01-11T22:16:36.2510790Z FINISHED PRINTING LOG FILE of distributed/test_c10d_object_collectives (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_object_collectives_bbs7aqho) 2023-01-11T22:16:36.2511372Z 2023-01-11T22:16:36.2511656Z Running distributed/_tensor/test_tp_sharding_ops ... [2023-01-11 22:16:36.246333] 2023-01-11T22:16:36.2512364Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_tensor/test_tp_sharding_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:16:36.246587] 2023-01-11T22:16:54.2915003Z 2023-01-11T22:16:54.2915540Z Expand the folded group to see the log file of distributed/_tensor/test_tp_sharding_ops 2023-01-11T22:16:54.2916556Z ##[group]PRINTING LOG FILE of distributed/_tensor/test_tp_sharding_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_tensor-test_tp_sharding_ops_qp6mmr3f) 2023-01-11T22:16:54.2916929Z 2023-01-11T22:16:54.2917048Z Running tests... 2023-01-11T22:16:54.2917544Z ---------------------------------------------------------------------- 2023-01-11T22:16:54.2918484Z Test results will be stored in test-reports/python-unittest/distributed._tensor.test_tp_sharding_ops 2023-01-11T22:16:54.2919370Z test_replicated_permute (__main__.TPShardingOpsTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:16:54.2919856Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45728 2023-01-11T22:16:54.2920319Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45729 2023-01-11T22:16:54.2920909Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 45730 2023-01-11T22:16:54.2921675Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 45731 2023-01-11T22:16:54.2922406Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2922862Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2923422Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2923905Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2924486Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2924913Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2925496Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2925966Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2926543Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2926969Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2927540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2928011Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2928568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2929011Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2929579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2930047Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2930468Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:16:54.2930945Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:16:54.2931415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:54.2931883Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:54.2932266Z skip: Need at least 4 CUDA devices (4.003s) 2023-01-11T22:16:54.2932735Z test_sharded_cat (__main__.TPShardingOpsTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45864 2023-01-11T22:16:54.2933676Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45865 2023-01-11T22:16:54.2934106Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 45866 2023-01-11T22:16:54.2934547Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 45867 2023-01-11T22:16:54.2935168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2935620Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2936175Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2936818Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2937412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2937838Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2938411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2938957Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2939556Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2939981Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2940552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2941015Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2941579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2942024Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2942597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2943066Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2943489Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:16:54.2943962Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:54.2944428Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:54.2944897Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:16:54.2945278Z skip: Need at least 4 CUDA devices (2.311s) 2023-01-11T22:16:54.2945755Z test_sharded_permute (__main__.TPShardingOpsTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46000 2023-01-11T22:16:54.2946278Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46001 2023-01-11T22:16:54.2946704Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46002 2023-01-11T22:16:54.2947146Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46003 2023-01-11T22:16:54.2947761Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2948218Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2948775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2949250Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2949836Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2950260Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2950831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2951295Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2951875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2952298Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2952871Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2953330Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2953908Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2954404Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2954979Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2955440Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2955910Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:16:54.2956396Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:16:54.2956863Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:54.2957333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:54.2957706Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:16:54.2958176Z test_sharded_split (__main__.TPShardingOpsTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46136 2023-01-11T22:16:54.2958695Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46137 2023-01-11T22:16:54.2959121Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46138 2023-01-11T22:16:54.2959563Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46139 2023-01-11T22:16:54.2960179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2960636Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2961193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2961660Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2962239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2962668Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2963242Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2963706Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2964283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2964708Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2965282Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2965751Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2966324Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2966751Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2967319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2967786Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2968201Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:16:54.2968678Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:16:54.2969140Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:54.2969604Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:54.2969979Z skip: Need at least 4 CUDA devices (2.310s) 2023-01-11T22:16:54.2970453Z test_sharded_transpose (__main__.TPShardingOpsTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46272 2023-01-11T22:16:54.2971045Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46273 2023-01-11T22:16:54.2971473Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46274 2023-01-11T22:16:54.2971915Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46275 2023-01-11T22:16:54.2972570Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2973353Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2974018Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2974485Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2975069Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2975518Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2976072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2976538Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2977119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2977544Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2978118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2978580Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2979159Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2979579Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2980153Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2980612Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2981030Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:16:54.2981505Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:54.2981966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:16:54.2982432Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:54.2982806Z skip: Need at least 4 CUDA devices (2.411s) 2023-01-11T22:16:54.2983270Z test_sharded_view (__main__.TPShardingOpsTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46408 2023-01-11T22:16:54.2983797Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46409 2023-01-11T22:16:54.2984224Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46410 2023-01-11T22:16:54.2984666Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46411 2023-01-11T22:16:54.2985271Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2985722Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2986276Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2986743Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2987319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2987762Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2988429Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2988893Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2989466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2989953Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2990546Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2991008Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2991585Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:16:54.2992006Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:16:54.2992584Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:16:54.2993045Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:16:54.2993462Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:16:54.2993935Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:16:54.2994401Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:16:54.2994867Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:16:54.2995240Z skip: Need at least 4 CUDA devices (2.311s) 2023-01-11T22:16:54.2995438Z 2023-01-11T22:16:54.2995718Z ---------------------------------------------------------------------- 2023-01-11T22:16:54.2996052Z Ran 6 tests in 15.757s 2023-01-11T22:16:54.2996215Z 2023-01-11T22:16:54.2996307Z OK (skipped=6) 2023-01-11T22:16:54.2996470Z 2023-01-11T22:16:54.2996599Z Generating XML reports... 2023-01-11T22:16:54.2997206Z Generated XML report: test-reports/python-unittest/distributed._tensor.test_tp_sharding_ops/TEST-TPShardingOpsTest-20230111221638.xml 2023-01-11T22:16:54.2997562Z 2023-01-11T22:16:54.2997896Z ##[endgroup] 2023-01-11T22:16:54.2998499Z FINISHED PRINTING LOG FILE of distributed/_tensor/test_tp_sharding_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_tensor-test_tp_sharding_ops_qp6mmr3f) 2023-01-11T22:16:54.2998862Z 2023-01-11T22:16:54.2999153Z Running distributed/tensor/parallel/test_tp_style ... [2023-01-11 22:16:54.291653] 2023-01-11T22:16:54.2999862Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/tensor/parallel/test_tp_style.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:16:54.291912] 2023-01-11T22:17:17.2655012Z 2023-01-11T22:17:17.2655745Z Expand the folded group to see the log file of distributed/tensor/parallel/test_tp_style 2023-01-11T22:17:17.2658162Z ##[group]PRINTING LOG FILE of distributed/tensor/parallel/test_tp_style (/var/lib/jenkins/workspace/test/test-reports/distributed-tensor-parallel-test_tp_style_sk5lcmrh) 2023-01-11T22:17:17.2658580Z 2023-01-11T22:17:17.2658697Z Running tests... 2023-01-11T22:17:17.2661990Z ---------------------------------------------------------------------- 2023-01-11T22:17:17.2662662Z Test results will be stored in test-reports/python-unittest/distributed.tensor.parallel.test_tp_style 2023-01-11T22:17:17.2663259Z test_colwise_parallel_style (__main__.TensorParallelStyleTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:17:17.2663763Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46579 2023-01-11T22:17:17.2664406Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46580 2023-01-11T22:17:17.2664861Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46581 2023-01-11T22:17:17.2665310Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46582 2023-01-11T22:17:17.2666198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2666636Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2667214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2667785Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2668397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2669660Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2670435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2670924Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2671564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2672028Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2672591Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2673124Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2673707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2674139Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2674714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2675190Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2675630Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:17.2676100Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:17.2676573Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:17:17.2677030Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:17:17.2677414Z skip: Need at least 4 CUDA devices (4.030s) 2023-01-11T22:17:17.2677912Z test_make_input_replicate_1d (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46715 2023-01-11T22:17:17.2678451Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46716 2023-01-11T22:17:17.2678900Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46717 2023-01-11T22:17:17.2679323Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46718 2023-01-11T22:17:17.2679936Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2680387Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2680959Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2681412Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2681996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2682442Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2682997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2683461Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2684035Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2684616Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2685175Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2685641Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2686279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2686716Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2687290Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2687751Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2688187Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:17:17.2688648Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:17.2689105Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:17.2689570Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:17:17.2689949Z skip: Need at least 4 CUDA devices (2.310s) 2023-01-11T22:17:17.2690451Z test_make_input_shard_1d (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46851 2023-01-11T22:17:17.2691058Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46852 2023-01-11T22:17:17.2691512Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46853 2023-01-11T22:17:17.2691959Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46854 2023-01-11T22:17:17.2692556Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2693651Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2694246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2694708Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2695289Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2695737Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2696309Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2696757Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2697332Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2697784Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2698341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2698806Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2699386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2699828Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2700380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2700842Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2701281Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:17:17.2701753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:17:17.2702354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:17.2702817Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:17.2703212Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:17:17.2703762Z test_make_output_replicate_1d (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46987 2023-01-11T22:17:17.2704318Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46988 2023-01-11T22:17:17.2704770Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46989 2023-01-11T22:17:17.2705214Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46990 2023-01-11T22:17:17.2705819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2706283Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2706856Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2707306Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2707894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2708341Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2708912Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2709357Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2709935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2710383Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2710935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2711400Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2711974Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2712420Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2712974Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2713439Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2713873Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:17.2714347Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:17:17.2714800Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:17:17.2715266Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:17.2715660Z skip: Need at least 4 CUDA devices (2.411s) 2023-01-11T22:17:17.2716129Z test_make_output_shard_1d (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47123 2023-01-11T22:17:17.2716667Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47124 2023-01-11T22:17:17.2717114Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 47125 2023-01-11T22:17:17.2717558Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 47126 2023-01-11T22:17:17.2718150Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2718672Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2719251Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2719705Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2720281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2720776Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2721365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2721811Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2722386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2722828Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2723405Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2723849Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2724423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2724863Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2725417Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2725877Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2726314Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:17:17.2726786Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:17:17.2727239Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:17.2727696Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:17.2728087Z skip: Need at least 4 CUDA devices (2.311s) 2023-01-11T22:17:17.2728556Z test_make_output_tensor (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47259 2023-01-11T22:17:17.2729093Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47260 2023-01-11T22:17:17.2729541Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 47261 2023-01-11T22:17:17.2729983Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 47262 2023-01-11T22:17:17.2730577Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2731026Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2731604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2732073Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2732633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2733466Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2734049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2734495Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2735070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2735518Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2736088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2736633Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2737215Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2737658Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2738305Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2738785Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2739220Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:17:17.2739693Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:17.2740138Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:17:17.2740605Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:17.2741002Z skip: Need at least 4 CUDA devices (2.411s) 2023-01-11T22:17:17.2741475Z test_prepare_output_error (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47395 2023-01-11T22:17:17.2742016Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47396 2023-01-11T22:17:17.2742471Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 47397 2023-01-11T22:17:17.2742919Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 47398 2023-01-11T22:17:17.2743518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2743968Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2744540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2745013Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2745571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2746015Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2746588Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2747034Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2747611Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2748055Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2748619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2749065Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2749647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2750092Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2750645Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2751107Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2751545Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:17.2752011Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:17.2752457Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:17:17.2752914Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:17:17.2753388Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:17:17.2753880Z test_rowwise_parallel_style (__main__.TensorParallelStyleTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47531 2023-01-11T22:17:17.2754405Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47532 2023-01-11T22:17:17.2754897Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 47533 2023-01-11T22:17:17.2755349Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 47534 2023-01-11T22:17:17.2755946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2756394Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2756969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2757443Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2758000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2758444Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2759016Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2759463Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2760037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2760481Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2761047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2761494Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2762068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:17.2762510Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:17.2763076Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:17.2763521Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:17.2763956Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:17:17.2764427Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:17:17.2764873Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:17.2765327Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:17.2765727Z skip: Need at least 4 CUDA devices (2.411s) 2023-01-11T22:17:17.2765922Z 2023-01-11T22:17:17.2766198Z ---------------------------------------------------------------------- 2023-01-11T22:17:17.2766512Z Ran 8 tests in 20.706s 2023-01-11T22:17:17.2766675Z 2023-01-11T22:17:17.2766785Z OK (skipped=8) 2023-01-11T22:17:17.2766941Z 2023-01-11T22:17:17.2767066Z Generating XML reports... 2023-01-11T22:17:17.2767703Z Generated XML report: test-reports/python-unittest/distributed.tensor.parallel.test_tp_style/TEST-TensorParallelStyleTest-20230111221656.xml 2023-01-11T22:17:17.2768101Z 2023-01-11T22:17:17.2768558Z ##[endgroup] 2023-01-11T22:17:17.2769206Z FINISHED PRINTING LOG FILE of distributed/tensor/parallel/test_tp_style (/var/lib/jenkins/workspace/test/test-reports/distributed-tensor-parallel-test_tp_style_sk5lcmrh) 2023-01-11T22:17:17.2769592Z 2023-01-11T22:17:17.2769870Z Running distributed/_tensor/test_redistribute ... [2023-01-11 22:17:17.265749] 2023-01-11T22:17:17.2770623Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_tensor/test_redistribute.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:17:17.266005] 2023-01-11T22:17:43.4397456Z 2023-01-11T22:17:43.4399972Z Expand the folded group to see the log file of distributed/_tensor/test_redistribute 2023-01-11T22:17:43.4401200Z ##[group]PRINTING LOG FILE of distributed/_tensor/test_redistribute (/var/lib/jenkins/workspace/test/test-reports/distributed-_tensor-test_redistribute_cohg734g) 2023-01-11T22:17:43.4401601Z 2023-01-11T22:17:43.4401780Z Running tests... 2023-01-11T22:17:43.4402643Z ---------------------------------------------------------------------- 2023-01-11T22:17:43.4404486Z Test results will be stored in test-reports/python-unittest/distributed._tensor.test_redistribute 2023-01-11T22:17:43.4405041Z test_multi_dim_mesh (__main__.MultiDimRedistributeTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:17:43.4406444Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47702 2023-01-11T22:17:43.4406935Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47703 2023-01-11T22:17:43.4407396Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 47704 2023-01-11T22:17:43.4407838Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 47705 2023-01-11T22:17:43.4408264Z INFO:torch.testing._internal.common_distributed:Started process 4 with pid 47706 2023-01-11T22:17:43.4408705Z INFO:torch.testing._internal.common_distributed:Started process 5 with pid 47707 2023-01-11T22:17:43.4409169Z INFO:torch.testing._internal.common_distributed:Started process 6 with pid 47708 2023-01-11T22:17:43.4409610Z INFO:torch.testing._internal.common_distributed:Started process 7 with pid 47709 2023-01-11T22:17:43.4410253Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4410710Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4411308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4412689Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4413773Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4414251Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4414838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4415299Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4415884Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4416342Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4416934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4417401Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4417978Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4418429Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4419028Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4419478Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4420054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4420510Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4421065Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4421746Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4422336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4422794Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4423444Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4423943Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4424536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4424988Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4425629Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4426105Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4426665Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4427109Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4427682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4428136Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4428569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 7 2023-01-11T22:17:43.4429044Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:43.4429510Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:17:43.4429962Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 4 2023-01-11T22:17:43.4430417Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:43.4430884Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 6 2023-01-11T22:17:43.4431347Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 5 2023-01-11T22:17:43.4431789Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:17:43.4432175Z skip: Need at least 8 CUDA devices (4.212s) 2023-01-11T22:17:43.4432675Z test_partial_to_replicate_forward_backward (__main__.RedistributeTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47974 2023-01-11T22:17:43.4433200Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47975 2023-01-11T22:17:43.4433814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4434269Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4434843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4435291Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4435874Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4436320Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4436873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4437336Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4437777Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:43.4438347Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:43.4438817Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:43.4439302Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:17:43.4440066Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:17:43.4440773Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:17:43.4441153Z ok (3.211s) 2023-01-11T22:17:43.4441578Z test_partial_to_shard (__main__.RedistributeTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48057 2023-01-11T22:17:43.4442093Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48058 2023-01-11T22:17:43.4442678Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4443135Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4443712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4444179Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4444744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4445197Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4445769Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4446231Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4446652Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:43.4447142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:43.4447626Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:43.4448088Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:17:43.4448746Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:17:43.4449440Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:17:43.4449831Z ok (3.211s) 2023-01-11T22:17:43.4450243Z test_replicate_to_partial (__main__.RedistributeTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48136 2023-01-11T22:17:43.4450765Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48137 2023-01-11T22:17:43.4451377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4451810Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4452386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4453213Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4453823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4454250Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4454820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4455289Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4455849Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:43.4456323Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:43.4456807Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:43.4457291Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:17:43.4458009Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:17:43.4458721Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:17:43.4459249Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:17:43.4459740Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:17:43.4460380Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:17:43.4460906Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:17:43.4461550Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:17:43.4462082Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:17:43.4462711Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:17:43.4463387Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:17:43.4463914Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:17:43.4464406Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:17:43.4465032Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:17:43.4465711Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:17:43.4466099Z ok (3.412s) 2023-01-11T22:17:43.4466540Z test_replicate_to_replicate_forward_backward (__main__.RedistributeTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48234 2023-01-11T22:17:43.4467093Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48235 2023-01-11T22:17:43.4467700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4468148Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4468697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4469146Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4469721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4470186Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4470753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4471219Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4471658Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:43.4472126Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:43.4472607Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:43.4473166Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:17:43.4473825Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:17:43.4474493Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:17:43.4474940Z ok (3.311s) 2023-01-11T22:17:43.4475396Z test_replicate_to_shard_forward_backward (__main__.RedistributeTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48317 2023-01-11T22:17:43.4475937Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48318 2023-01-11T22:17:43.4476538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4476991Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4477569Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4478021Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4478602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4479051Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4479624Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4480071Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4480514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:43.4480999Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:43.4481472Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:43.4481952Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:17:43.4482610Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:17:43.4483298Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:17:43.4483673Z ok (3.211s) 2023-01-11T22:17:43.4484122Z test_shard_to_replicate_forward_backward (__main__.RedistributeTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48400 2023-01-11T22:17:43.4484661Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48401 2023-01-11T22:17:43.4485268Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4485708Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4486287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4486752Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4487314Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:17:43.4487761Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:17:43.4488332Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:17:43.4488797Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:17:43.4489214Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:17:43.4489701Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:17:43.4490258Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:17:43.4490719Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:17:43.4491376Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:17:43.4492119Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:17:43.4492518Z ok (3.312s) 2023-01-11T22:17:43.4492669Z 2023-01-11T22:17:43.4493399Z ---------------------------------------------------------------------- 2023-01-11T22:17:43.4493743Z Ran 7 tests in 23.881s 2023-01-11T22:17:43.4493908Z 2023-01-11T22:17:43.4494020Z OK (skipped=1) 2023-01-11T22:17:43.4494175Z 2023-01-11T22:17:43.4494303Z Generating XML reports... 2023-01-11T22:17:43.4494895Z Generated XML report: test-reports/python-unittest/distributed._tensor.test_redistribute/TEST-RedistributeTest-20230111221719.xml 2023-01-11T22:17:43.4495677Z Generated XML report: test-reports/python-unittest/distributed._tensor.test_redistribute/TEST-MultiDimRedistributeTest-20230111221719.xml 2023-01-11T22:17:43.4496048Z 2023-01-11T22:17:43.4496504Z ##[endgroup] 2023-01-11T22:17:43.4497110Z FINISHED PRINTING LOG FILE of distributed/_tensor/test_redistribute (/var/lib/jenkins/workspace/test/test-reports/distributed-_tensor-test_redistribute_cohg734g) 2023-01-11T22:17:43.4497472Z 2023-01-11T22:17:43.4497781Z Running distributed/_shard/sharded_tensor/ops/test_matrix_ops ... [2023-01-11 22:17:43.440011] 2023-01-11T22:17:43.4498518Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_matrix_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:17:43.440317] 2023-01-11T22:18:13.5076545Z 2023-01-11T22:18:13.5077299Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_matrix_ops 2023-01-11T22:18:13.5078460Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_matrix_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_matrix_ops_zxfrvq4t) 2023-01-11T22:18:13.5078866Z 2023-01-11T22:18:13.5080716Z Running tests... 2023-01-11T22:18:13.5081669Z ---------------------------------------------------------------------- 2023-01-11T22:18:13.5084949Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_matrix_ops 2023-01-11T22:18:13.5085767Z test_sharded_tensor_contiguous (__main__.TestShardedTensorMatrixOps) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:13.5086354Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48520 2023-01-11T22:18:13.5087195Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48521 2023-01-11T22:18:13.5087827Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 48522 2023-01-11T22:18:13.5088767Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 48523 2023-01-11T22:18:13.5089972Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5090477Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5091064Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5091947Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5093330Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5094197Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5094941Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5096069Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5097221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5098085Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5099310Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5100162Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5101015Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5101449Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5102459Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5103118Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5104019Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:13.5104857Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:13.5105708Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:13.5106522Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:13.5107152Z skip: Need at least 4 CUDA devices (4.019s) 2023-01-11T22:18:13.5107989Z test_sharded_tensor_layer_norm (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48656 2023-01-11T22:18:13.5108654Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48657 2023-01-11T22:18:13.5109110Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 48658 2023-01-11T22:18:13.5109543Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 48659 2023-01-11T22:18:13.5110169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5110625Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5111206Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5111661Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5112243Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5112690Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5113240Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5113712Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5114294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5114735Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5115280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5115747Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5116326Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5116764Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5117319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5117787Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5118346Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:13.5118801Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:13.5119262Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:13.5119786Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:13.5120192Z skip: Need at least 4 CUDA devices (2.311s) 2023-01-11T22:18:13.5120682Z test_sharded_tensor_layer_norm_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48792 2023-01-11T22:18:13.5121237Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48793 2023-01-11T22:18:13.5121687Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 48794 2023-01-11T22:18:13.5122113Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 48795 2023-01-11T22:18:13.5122854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5123287Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5123852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5124300Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5124873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5125323Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5125904Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5126371Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5126934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5127379Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5127951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5128420Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5128974Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5129419Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5129986Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5130448Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5130869Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:13.5131340Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:13.5131804Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:13.5132248Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:13.5132640Z skip: Need at least 4 CUDA devices (2.411s) 2023-01-11T22:18:13.5133585Z test_sharded_tensor_masked_fill (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48928 2023-01-11T22:18:13.5134143Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48929 2023-01-11T22:18:13.5134569Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 48930 2023-01-11T22:18:13.5135006Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 48931 2023-01-11T22:18:13.5135766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5136200Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5136774Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5137320Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5137922Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5138349Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5138920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5139385Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5139965Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5140389Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5141000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5141472Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5142035Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5142479Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5143051Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5143515Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5143931Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:13.5144399Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:13.5144871Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:13.5145312Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:13.5145705Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:18:13.5146221Z test_sharded_tensor_masked_fill_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49064 2023-01-11T22:18:13.5146775Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49065 2023-01-11T22:18:13.5147198Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 49066 2023-01-11T22:18:13.5147633Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 49067 2023-01-11T22:18:13.5148251Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5148702Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5149260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5149730Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5150313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5150739Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5151312Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5151774Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5152350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5152861Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5153439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5153904Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5154540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5154999Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5155579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5156035Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5156454Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:13.5156930Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:13.5157398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:13.5157842Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:13.5158232Z skip: Need at least 4 CUDA devices (2.411s) 2023-01-11T22:18:13.5158734Z test_sharded_tensor_softmax (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49200 2023-01-11T22:18:13.5159284Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49201 2023-01-11T22:18:13.5159710Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 49202 2023-01-11T22:18:13.5160143Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 49203 2023-01-11T22:18:13.5160752Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5161209Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5161764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5162234Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5162816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5163243Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5163809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5164275Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5164850Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5165276Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5165842Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5177881Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5178551Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5178995Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5179583Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5180055Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5180479Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:13.5180960Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:13.5181602Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:13.5182077Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:13.5182451Z skip: Need at least 4 CUDA devices (2.411s) 2023-01-11T22:18:13.5183050Z test_sharded_tensor_transpose (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49336 2023-01-11T22:18:13.5183625Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49337 2023-01-11T22:18:13.5184080Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 49338 2023-01-11T22:18:13.5184509Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 49339 2023-01-11T22:18:13.5185131Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5185591Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5186154Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5186627Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5187215Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5187663Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5188216Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5188683Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5189259Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5189683Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5190253Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5190715Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5191294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5191721Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5192296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5192758Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5193198Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:13.5193652Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:13.5194119Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:13.5194588Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:13.5194964Z skip: Need at least 4 CUDA devices (2.411s) 2023-01-11T22:18:13.5195483Z test_sharded_tensor_transpose_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49472 2023-01-11T22:18:13.5196045Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49473 2023-01-11T22:18:13.5196497Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 49474 2023-01-11T22:18:13.5196926Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 49475 2023-01-11T22:18:13.5197539Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5198078Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5198638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5199111Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5199692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5200193Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5200761Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5201231Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5201810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5202255Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5202809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5203270Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5203846Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5204273Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5204845Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5205305Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5205739Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:13.5206192Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:13.5206653Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:13.5207123Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:13.5207498Z skip: Need at least 4 CUDA devices (2.410s) 2023-01-11T22:18:13.5208000Z test_sharded_tensor_type_as (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49608 2023-01-11T22:18:13.5208556Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49609 2023-01-11T22:18:13.5209010Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 49610 2023-01-11T22:18:13.5209438Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 49611 2023-01-11T22:18:13.5210042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5210494Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5211060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5211533Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5212116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5212566Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5213416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5213894Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5214475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5214920Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5215470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5216066Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5216654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5217080Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5217719Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5218199Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5218635Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:13.5219090Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:13.5219549Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:13.5220024Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:13.5220396Z skip: Need at least 4 CUDA devices (2.310s) 2023-01-11T22:18:13.5220894Z test_sharded_tensor_view (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49744 2023-01-11T22:18:13.5221443Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49745 2023-01-11T22:18:13.5221896Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 49746 2023-01-11T22:18:13.5222324Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 49747 2023-01-11T22:18:13.5222941Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5223393Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5223969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5224429Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5225009Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5225454Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5226011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5226480Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5227060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5227506Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5228050Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5228508Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5229088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5229537Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5230126Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5230594Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5231034Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:13.5231487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:13.5231949Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:13.5232495Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:13.5232871Z skip: Need at least 4 CUDA devices (2.411s) 2023-01-11T22:18:13.5233375Z test_sharded_tensor_view_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49880 2023-01-11T22:18:13.5233926Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49881 2023-01-11T22:18:13.5234426Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 49882 2023-01-11T22:18:13.5234868Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 49883 2023-01-11T22:18:13.5235483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5235929Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5236504Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5236957Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5237536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5237980Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5238533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5239000Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5239574Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5240017Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5240564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5241031Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5241640Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:13.5242088Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:13.5242643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:13.5243110Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:13.5243552Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:18:13.5244007Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:18:13.5244466Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:13.5244934Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:18:13.5245333Z skip: Need at least 4 CUDA devices (2.310s) 2023-01-11T22:18:13.5245509Z 2023-01-11T22:18:13.5245791Z ---------------------------------------------------------------------- 2023-01-11T22:18:13.5246130Z Ran 11 tests in 27.826s 2023-01-11T22:18:13.5246299Z 2023-01-11T22:18:13.5246414Z OK (skipped=11) 2023-01-11T22:18:13.5246573Z 2023-01-11T22:18:13.5246683Z Generating XML reports... 2023-01-11T22:18:13.5247362Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_matrix_ops/TEST-TestShardedTensorMatrixOps-20230111221745.xml 2023-01-11T22:18:13.5247772Z 2023-01-11T22:18:13.5248266Z ##[endgroup] 2023-01-11T22:18:13.5248957Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_matrix_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_matrix_ops_zxfrvq4t) 2023-01-11T22:18:13.5249355Z 2023-01-11T22:18:13.5249650Z Running distributed/fsdp/test_fsdp_flatten_params ... [2023-01-11 22:18:13.508121] 2023-01-11T22:18:13.5250434Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_flatten_params.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:18:13.508408] 2023-01-11T22:18:46.6837365Z 2023-01-11T22:18:46.6837881Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_flatten_params 2023-01-11T22:18:46.6841183Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_flatten_params (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_flatten_params_tilje5p3) 2023-01-11T22:18:46.6841635Z 2023-01-11T22:18:46.6841736Z Running tests... 2023-01-11T22:18:46.6842262Z ---------------------------------------------------------------------- 2023-01-11T22:18:46.6843262Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_flatten_params 2023-01-11T22:18:46.6843721Z test_empty_module (__main__.TestFlattenParams) 2023-01-11T22:18:46.6844194Z Tests flattening an empty module (i.e. one without any parameters). ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:18:46.6844687Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50051 2023-01-11T22:18:46.6852588Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:46.6853469Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:46.6854091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:46.6854574Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:46.6855028Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:46.6855699Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:18:46.6856252Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:46.6857033Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:18:46.6857507Z warnings.warn( 2023-01-11T22:18:46.6857765Z dist init r=0, world=1 2023-01-11T22:18:46.6858040Z ok (4.655s) 2023-01-11T22:18:46.6858363Z test_flat_param_shard_metadata (__main__.TestFlattenParams) 2023-01-11T22:18:46.6858876Z Tests that ``FlatParameter`` shard metadata are computed as expected. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50091 2023-01-11T22:18:46.6859591Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:46.6860054Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:46.6860618Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:46.6861099Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:46.6861561Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:46.6862223Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:18:46.6862733Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:46.6863096Z dist init r=0, world=1 2023-01-11T22:18:46.6863349Z ok (3.112s) 2023-01-11T22:18:46.6863633Z test_flatten_nothing (__main__.TestFlattenParams) 2023-01-11T22:18:46.6864190Z Tests that constructing a ``FlatParamHandle`` with no parameters ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50131 2023-01-11T22:18:46.6864892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:46.6865550Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:46.6866142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:46.6866600Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:46.6867149Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:46.6867839Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:18:46.6868369Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:46.6868711Z dist init r=0, world=1 2023-01-11T22:18:46.6868961Z ok (3.009s) 2023-01-11T22:18:46.6869280Z test_numel_with_shared_params (__main__.TestFlattenParams) 2023-01-11T22:18:46.6869789Z Tests that numel is preserved after flattening when there are shared ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50171 2023-01-11T22:18:46.6870489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:46.6870944Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:46.6871525Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:46.6871979Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:46.6872439Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:46.6873098Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:18:46.6873603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:46.6873968Z dist init r=0, world=1 2023-01-11T22:18:46.6874215Z ok (3.009s) 2023-01-11T22:18:46.6874535Z test_numel_without_shared_params (__main__.TestFlattenParams) 2023-01-11T22:18:46.6875043Z Tests that numel is preserved after flattening when there are no shared ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50211 2023-01-11T22:18:46.6875753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:46.6876207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:46.6876767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:46.6877237Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:46.6877692Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:46.6878355Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:18:46.6878866Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:46.6879230Z dist init r=0, world=1 2023-01-11T22:18:46.6879477Z ok (3.110s) 2023-01-11T22:18:46.6879776Z test_output_with_shared_params (__main__.TestFlattenParams) 2023-01-11T22:18:46.6880307Z Tests a forward pass after flattening when there are shared parameters ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50251 2023-01-11T22:18:46.6881005Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:46.6881459Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:46.6882017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:46.6882574Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:46.6883033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:46.6883699Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:18:46.6884205Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:46.6885040Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:18:46.6885535Z warnings.warn( 2023-01-11T22:18:46.6885770Z dist init r=0, world=1 2023-01-11T22:18:46.6886017Z ok (3.511s) 2023-01-11T22:18:46.6886335Z test_output_without_shared_params (__main__.TestFlattenParams) 2023-01-11T22:18:46.6886837Z Tests a forward pass after flattening when there are no shared ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50291 2023-01-11T22:18:46.6887537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:46.6887991Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:46.6888573Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:46.6889030Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:46.6889489Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:46.6890151Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:18:46.6890675Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:46.6891434Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:18:46.6891918Z warnings.warn( 2023-01-11T22:18:46.6892173Z dist init r=0, world=1 2023-01-11T22:18:46.6892397Z ok (3.611s) 2023-01-11T22:18:46.6892701Z test_partial_flattening (__main__.TestFlattenParams) 2023-01-11T22:18:46.6893572Z Tests flattening some submodules but not others. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50331 2023-01-11T22:18:46.6894251Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:46.6894681Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:46.6895260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:46.6895730Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:46.6896173Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:46.6896834Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:18:46.6897357Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:46.6898136Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:18:46.6898604Z warnings.warn( 2023-01-11T22:18:46.6899764Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:18:46.6900727Z warnings.warn( 2023-01-11T22:18:46.6900983Z dist init r=0, world=1 2023-01-11T22:18:46.6901230Z ok (3.109s) 2023-01-11T22:18:46.6901543Z test_pnorm_after_step_with_shared_params (__main__.TestFlattenParams) 2023-01-11T22:18:46.6902164Z Tests for parameter Frobenius norm parity after an optimizer step when ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50371 2023-01-11T22:18:46.6902893Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:18:46.6903354Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:18:46.6903915Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:18:46.6904393Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:18:46.6904850Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:18:46.6905489Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:18:46.6906013Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:18:46.6906795Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:18:46.6907289Z warnings.warn( 2023-01-11T22:18:46.6907524Z dist init r=0, world=1 2023-01-11T22:18:46.6907771Z ok (3.710s) 2023-01-11T22:18:46.6907923Z 2023-01-11T22:18:46.6908200Z ---------------------------------------------------------------------- 2023-01-11T22:18:46.6908514Z Ran 9 tests in 30.836s 2023-01-11T22:18:46.6908683Z 2023-01-11T22:18:46.6908781Z OK 2023-01-11T22:18:46.6908917Z 2023-01-11T22:18:46.6909045Z Generating XML reports... 2023-01-11T22:18:46.6909662Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_flatten_params/TEST-TestFlattenParams-20230111221815.xml 2023-01-11T22:18:46.6910007Z 2023-01-11T22:18:46.6910396Z ##[endgroup] 2023-01-11T22:18:46.6911041Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_flatten_params (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_flatten_params_tilje5p3) 2023-01-11T22:18:46.6911420Z 2023-01-11T22:18:46.6911702Z Running distributed/fsdp/test_fsdp_exec_order ... [2023-01-11 22:18:46.683924] 2023-01-11T22:18:46.6912373Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_exec_order.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:18:46.684182] 2023-01-11T22:19:21.1147943Z 2023-01-11T22:19:21.1149050Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_exec_order 2023-01-11T22:19:21.1150019Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_exec_order (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_exec_order_offr53p2) 2023-01-11T22:19:21.1150391Z 2023-01-11T22:19:21.1150509Z Running tests... 2023-01-11T22:19:21.1153374Z ---------------------------------------------------------------------- 2023-01-11T22:19:21.1154034Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_exec_order 2023-01-11T22:19:21.1154581Z test_invalid_first_iter_order_sharding_strategy_ShardingStrategy_FULL_SHARD (__main__.TestFSDPExecOrder) 2023-01-11T22:19:21.1155223Z Tests that FSDP errors if the all-gather order differs across ranks ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:19:21.1156523Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50448 2023-01-11T22:19:21.1157000Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50449 2023-01-11T22:19:21.1157983Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1158470Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1159067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1159690Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1160343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1160825Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1161422Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1161935Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1162429Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:21.1162956Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:21.1163639Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1164374Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1164934Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:21.1165417Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:21.1166757Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1167680Z warnings.warn( 2023-01-11T22:19:21.1168919Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1169765Z warnings.warn( 2023-01-11T22:19:21.1170030Z dist init r=1, world=2 2023-01-11T22:19:21.1170280Z dist init r=0, world=2 2023-01-11T22:19:21.1170638Z ok (5.430s) 2023-01-11T22:19:21.1171031Z test_invalid_first_iter_order_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestFSDPExecOrder) 2023-01-11T22:19:21.1171805Z Tests that FSDP errors if the all-gather order differs across ranks ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50527 2023-01-11T22:19:21.1172380Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50528 2023-01-11T22:19:21.1173563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1174036Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1174651Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1175152Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1175768Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1176360Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1176978Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1177477Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1177935Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:21.1178543Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:21.1179258Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1179988Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1180523Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:21.1181032Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:21.1182376Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1183216Z warnings.warn( 2023-01-11T22:19:21.1184442Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1185260Z warnings.warn( 2023-01-11T22:19:21.1185524Z dist init r=0, world=2 2023-01-11T22:19:21.1185791Z dist init r=1, world=2 2023-01-11T22:19:21.1186028Z ok (3.812s) 2023-01-11T22:19:21.1186471Z test_invalid_later_iter_order_sharding_strategy_ShardingStrategy_FULL_SHARD_iters_before_path_change_1 (__main__.TestFSDPExecOrder) 2023-01-11T22:19:21.1187268Z Tests that FSDP warns the user if the all-gather order changes after ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50606 2023-01-11T22:19:21.1187850Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50607 2023-01-11T22:19:21.1188479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1188961Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1189578Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1190080Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1190680Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1191160Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1191772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1192250Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1192729Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:21.1193254Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:21.1193951Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1194747Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1195302Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:21.1195805Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:21.1197198Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1198030Z warnings.warn( 2023-01-11T22:19:21.1199261Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1200100Z warnings.warn( 2023-01-11T22:19:21.1200366Z dist init r=0, world=2 2023-01-11T22:19:21.1200612Z dist init r=1, world=2 2023-01-11T22:19:21.1200867Z ok (3.813s) 2023-01-11T22:19:21.1201315Z test_invalid_later_iter_order_sharding_strategy_ShardingStrategy_FULL_SHARD_iters_before_path_change_3 (__main__.TestFSDPExecOrder) 2023-01-11T22:19:21.1202116Z Tests that FSDP warns the user if the all-gather order changes after ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50689 2023-01-11T22:19:21.1202679Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50690 2023-01-11T22:19:21.1203321Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1203797Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1204417Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1204900Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1205518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1205998Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1206587Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1207090Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1207568Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:21.1208093Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:21.1208772Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1209510Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1210066Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:21.1210569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:21.1211885Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1212806Z warnings.warn( 2023-01-11T22:19:21.1214683Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1215535Z warnings.warn( 2023-01-11T22:19:21.1215801Z dist init r=0, world=2 2023-01-11T22:19:21.1216049Z dist init r=1, world=2 2023-01-11T22:19:21.1216310Z ok (3.812s) 2023-01-11T22:19:21.1216764Z test_invalid_later_iter_order_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_iters_before_path_change_1 (__main__.TestFSDPExecOrder) 2023-01-11T22:19:21.1217558Z Tests that FSDP warns the user if the all-gather order changes after ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50772 2023-01-11T22:19:21.1218135Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50773 2023-01-11T22:19:21.1218784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1219267Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1219854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1220358Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1220983Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1221444Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1222050Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1222550Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1223035Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:21.1223544Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:21.1224241Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1224970Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1225532Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:21.1226012Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:21.1227353Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1228190Z warnings.warn( 2023-01-11T22:19:21.1229420Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1230359Z warnings.warn( 2023-01-11T22:19:21.1230605Z dist init r=0, world=2 2023-01-11T22:19:21.1230875Z dist init r=1, world=2 2023-01-11T22:19:21.1231129Z ok (3.812s) 2023-01-11T22:19:21.1231617Z test_invalid_later_iter_order_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_iters_before_path_change_3 (__main__.TestFSDPExecOrder) 2023-01-11T22:19:21.1232432Z Tests that FSDP warns the user if the all-gather order changes after ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50855 2023-01-11T22:19:21.1233011Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50856 2023-01-11T22:19:21.1233663Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1234125Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1234733Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1235233Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1235831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1236310Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1236916Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1237412Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1237871Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:21.1238399Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:21.1239094Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1239826Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1240360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:21.1240860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:21.1242203Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1243046Z warnings.warn( 2023-01-11T22:19:21.1244255Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1245122Z warnings.warn( 2023-01-11T22:19:21.1245390Z dist init r=0, world=2 2023-01-11T22:19:21.1245659Z dist init r=1, world=2 2023-01-11T22:19:21.1245893Z ok (3.812s) 2023-01-11T22:19:21.1246407Z test_train_eval_sharding_strategy_ShardingStrategy_FULL_SHARD (__main__.TestFSDPExecOrder) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50938 2023-01-11T22:19:21.1247107Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50939 2023-01-11T22:19:21.1247762Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1248221Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1248829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1249385Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1249998Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1250475Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1251085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1251588Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1252047Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:21.1252575Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:21.1253522Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1254263Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1254800Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:21.1255297Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:21.1256636Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1257477Z warnings.warn( 2023-01-11T22:19:21.1258685Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1259520Z warnings.warn( 2023-01-11T22:19:21.1259783Z dist init r=1, world=2 2023-01-11T22:19:21.1260048Z dist init r=0, world=2 2023-01-11T22:19:21.1260288Z ok (3.812s) 2023-01-11T22:19:21.1260803Z test_train_eval_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestFSDPExecOrder) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51021 2023-01-11T22:19:21.1261416Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51022 2023-01-11T22:19:21.1262064Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1262528Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1263137Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1263637Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1264236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:21.1264843Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:21.1265463Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:21.1265966Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:21.1266422Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:21.1267024Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:21.1267736Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1268466Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:21.1268999Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:21.1269508Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:21.1270851Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1271688Z warnings.warn( 2023-01-11T22:19:21.1272893Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:21.1273731Z warnings.warn( 2023-01-11T22:19:21.1273995Z dist init r=1, world=2 2023-01-11T22:19:21.1274262Z dist init r=0, world=2 2023-01-11T22:19:21.1274500Z ok (3.812s) 2023-01-11T22:19:21.1274657Z 2023-01-11T22:19:21.1274947Z ---------------------------------------------------------------------- 2023-01-11T22:19:21.1275301Z Ran 8 tests in 32.117s 2023-01-11T22:19:21.1275473Z 2023-01-11T22:19:21.1275553Z OK 2023-01-11T22:19:21.1275694Z 2023-01-11T22:19:21.1275825Z Generating XML reports... 2023-01-11T22:19:21.1276468Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_exec_order/TEST-TestFSDPExecOrder-20230111221848.xml 2023-01-11T22:19:21.1276843Z 2023-01-11T22:19:21.1277239Z ##[endgroup] 2023-01-11T22:19:21.1277872Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_exec_order (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_exec_order_offr53p2) 2023-01-11T22:19:21.1278271Z 2023-01-11T22:19:21.1278585Z Running distributed/fsdp/test_fsdp_sharded_grad_scaler ... [2023-01-11 22:19:21.114974] 2023-01-11T22:19:21.1279348Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_sharded_grad_scaler.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:19:21.115231] 2023-01-11T22:19:56.4720928Z 2023-01-11T22:19:56.4721649Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_sharded_grad_scaler 2023-01-11T22:19:56.4725131Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_sharded_grad_scaler (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_sharded_grad_scaler_zp4lfyi_) 2023-01-11T22:19:56.4725566Z 2023-01-11T22:19:56.4725687Z Running tests... 2023-01-11T22:19:56.4726191Z ---------------------------------------------------------------------- 2023-01-11T22:19:56.4726789Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_sharded_grad_scaler 2023-01-11T22:19:56.4727601Z test_grad_scaling (__main__.TestShardGradScaler) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:19:56.4727941Z ok (1.630s) 2023-01-11T22:19:56.4728295Z test_inf_gradients_skip_optim_step (__main__.TestShardGradScaler) ... ok (0.002s) 2023-01-11T22:19:56.4728740Z test_scaling_unscaling_sparse (__main__.TestShardGradScaler) ... ok (0.006s) 2023-01-11T22:19:56.4732337Z test_fsdp_ddp_parity_with_grad_scaler_offload_false_none_mixed_precision (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51154 2023-01-11T22:19:56.4733519Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51155 2023-01-11T22:19:56.4734228Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4735059Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4735721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4736191Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4736780Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4737237Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4737799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4738276Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4738748Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:56.4739244Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:56.4739893Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4740596Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4741166Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:56.4741646Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:56.4742136Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4742602Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4743887Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4744738Z warnings.warn( 2023-01-11T22:19:56.4745917Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4746738Z warnings.warn( 2023-01-11T22:19:56.4747123Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4747589Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4748129Z dist init r=0, world=2 2023-01-11T22:19:56.4748383Z dist init r=1, world=2 2023-01-11T22:19:56.4748612Z ok (3.914s) 2023-01-11T22:19:56.4749145Z test_fsdp_ddp_parity_with_grad_scaler_offload_false_none_none (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51237 2023-01-11T22:19:56.4749765Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51238 2023-01-11T22:19:56.4750483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4750936Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4751525Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4752010Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4752580Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4753038Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4753629Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4754099Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4754536Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:56.4755034Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:56.4755689Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4756375Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4756879Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:56.4757359Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:56.4757837Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4758304Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4759581Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4760365Z warnings.warn( 2023-01-11T22:19:56.4761526Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4762292Z warnings.warn( 2023-01-11T22:19:56.4762674Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4763142Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4763506Z dist init r=1, world=2 2023-01-11T22:19:56.4763762Z dist init r=0, world=2 2023-01-11T22:19:56.4763983Z ok (4.012s) 2023-01-11T22:19:56.4764531Z test_fsdp_ddp_parity_with_grad_scaler_offload_false_shard_grad_op_mixed_precision (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51320 2023-01-11T22:19:56.4765262Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51321 2023-01-11T22:19:56.4765892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4766328Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4766966Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4767451Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4768021Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4768477Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4769056Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4769522Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4769956Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:56.4770448Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:56.4771106Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4771780Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4772305Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:56.4772776Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:56.4773832Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4774308Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4775596Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4776376Z warnings.warn( 2023-01-11T22:19:56.4777531Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4778306Z warnings.warn( 2023-01-11T22:19:56.4778662Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4779147Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4779511Z dist init r=0, world=2 2023-01-11T22:19:56.4779778Z dist init r=1, world=2 2023-01-11T22:19:56.4780002Z ok (3.912s) 2023-01-11T22:19:56.4780538Z test_fsdp_ddp_parity_with_grad_scaler_offload_false_shard_grad_op_none (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51403 2023-01-11T22:19:56.4781163Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51404 2023-01-11T22:19:56.4781757Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4782338Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4782919Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4783395Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4784026Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4784493Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4785075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4785526Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4785983Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:56.4786692Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:56.4787358Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4788031Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4788563Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:56.4789035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:56.4789510Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4789974Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4791248Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4792024Z warnings.warn( 2023-01-11T22:19:56.4793167Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4793938Z warnings.warn( 2023-01-11T22:19:56.4794297Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4794785Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4795147Z dist init r=1, world=2 2023-01-11T22:19:56.4795382Z dist init r=0, world=2 2023-01-11T22:19:56.4795622Z ok (3.812s) 2023-01-11T22:19:56.4796161Z test_fsdp_ddp_parity_with_grad_scaler_offload_true_none_mixed_precision (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51486 2023-01-11T22:19:56.4796782Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51487 2023-01-11T22:19:56.4797378Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4797828Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4798411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4798969Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4799536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4799984Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4800558Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4801063Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4801533Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:56.4802029Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:56.4802687Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4803362Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4803882Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:56.4804351Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:56.4804828Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4805294Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4806565Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4807339Z warnings.warn( 2023-01-11T22:19:56.4808495Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4809265Z warnings.warn( 2023-01-11T22:19:56.4809514Z File "", line 1, in 2023-01-11T22:19:56.4809887Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:19:56.4810266Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:19:56.4810621Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:19:56.4811000Z return self._bootstrap(parent_sentinel) 2023-01-11T22:19:56.4811389Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:19:56.4811707Z self.run() 2023-01-11T22:19:56.4812042Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:19:56.4812411Z self._target(*self._args, **self._kwargs) 2023-01-11T22:19:56.4813478Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:19:56.4813903Z self.run_test(test_name, pipe) 2023-01-11T22:19:56.4814449Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:19:56.4814853Z getattr(self, test_name)() 2023-01-11T22:19:56.4815354Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:19:56.4815722Z fn() 2023-01-11T22:19:56.4816347Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:19:56.4816727Z test(self, **param_kwargs) 2023-01-11T22:19:56.4817245Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:19:56.4817636Z return func(*args, **kwargs) 2023-01-11T22:19:56.4818151Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py", line 171, in test_fsdp_ddp_parity_with_grad_scaler 2023-01-11T22:19:56.4818571Z self._test_fsdp_parity( 2023-01-11T22:19:56.4819100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:19:56.4819526Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:19:56.4820065Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:19:56.4820468Z output = model(*input) 2023-01-11T22:19:56.4820955Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:19:56.4821336Z return forward_call(*args, **kwargs) 2023-01-11T22:19:56.4821866Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:19:56.4822324Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:19:56.4822888Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:19:56.4823265Z _lazy_init(state, module) 2023-01-11T22:19:56.4823774Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:19:56.4824208Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:19:56.4824798Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:19:56.4825214Z handle.init_flat_param_attributes() 2023-01-11T22:19:56.4825724Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:19:56.4826105Z return func(*args, **kwargs) 2023-01-11T22:19:56.4826621Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:19:56.4827009Z p_assert( 2023-01-11T22:19:56.4827484Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:19:56.4827864Z traceback.print_stack() 2023-01-11T22:19:56.4828134Z File "", line 1, in 2023-01-11T22:19:56.4828513Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:19:56.4828886Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:19:56.4829243Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:19:56.4829619Z return self._bootstrap(parent_sentinel) 2023-01-11T22:19:56.4830010Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:19:56.4830329Z self.run() 2023-01-11T22:19:56.4830664Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:19:56.4831030Z self._target(*self._args, **self._kwargs) 2023-01-11T22:19:56.4831547Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:19:56.4831916Z self.run_test(test_name, pipe) 2023-01-11T22:19:56.4832441Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:19:56.4832835Z getattr(self, test_name)() 2023-01-11T22:19:56.4833328Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:19:56.4833779Z fn() 2023-01-11T22:19:56.4834282Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:19:56.4834679Z test(self, **param_kwargs) 2023-01-11T22:19:56.4835172Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:19:56.4835560Z return func(*args, **kwargs) 2023-01-11T22:19:56.4836075Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py", line 171, in test_fsdp_ddp_parity_with_grad_scaler 2023-01-11T22:19:56.4836484Z self._test_fsdp_parity( 2023-01-11T22:19:56.4837012Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:19:56.4837438Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:19:56.4837993Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:19:56.4838374Z output = model(*input) 2023-01-11T22:19:56.4838850Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:19:56.4839232Z return forward_call(*args, **kwargs) 2023-01-11T22:19:56.4839757Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:19:56.4840208Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:19:56.4840768Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:19:56.4841160Z _lazy_init(state, module) 2023-01-11T22:19:56.4841646Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:19:56.4842081Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:19:56.4842674Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:19:56.4843084Z handle.init_flat_param_attributes() 2023-01-11T22:19:56.4843601Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:19:56.4843985Z return func(*args, **kwargs) 2023-01-11T22:19:56.4844503Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:19:56.4844891Z p_assert( 2023-01-11T22:19:56.4845361Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:19:56.4845738Z traceback.print_stack() 2023-01-11T22:19:56.4846115Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4846609Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4847133Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4847596Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4847963Z dist init r=0, world=2 2023-01-11T22:19:56.4848434Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:19:56.4848879Z dist init r=1, world=2 2023-01-11T22:19:56.4849328Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:19:56.4849751Z ok (3.912s) 2023-01-11T22:19:56.4850272Z test_fsdp_ddp_parity_with_grad_scaler_offload_true_none_none (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51569 2023-01-11T22:19:56.4850967Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51570 2023-01-11T22:19:56.4851571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4852025Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4852606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4853774Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4854417Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4854871Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4855451Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4855903Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4856370Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:56.4856866Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:56.4857502Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4858195Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4858718Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:56.4859194Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:56.4859656Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4860138Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4861421Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4862197Z warnings.warn( 2023-01-11T22:19:56.4863352Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4864109Z warnings.warn( 2023-01-11T22:19:56.4864381Z File "", line 1, in 2023-01-11T22:19:56.4864757Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:19:56.4865135Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:19:56.4865491Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:19:56.4865864Z return self._bootstrap(parent_sentinel) 2023-01-11T22:19:56.4866253Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:19:56.4866569Z self.run() 2023-01-11T22:19:56.4866902Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:19:56.4867268Z self._target(*self._args, **self._kwargs) 2023-01-11T22:19:56.4867769Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:19:56.4868274Z self.run_test(test_name, pipe) 2023-01-11T22:19:56.4868808Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:19:56.4869209Z getattr(self, test_name)() 2023-01-11T22:19:56.4869709Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:19:56.4870083Z fn() 2023-01-11T22:19:56.4870637Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:19:56.4871025Z test(self, **param_kwargs) 2023-01-11T22:19:56.4871313Z File "", line 1, in 2023-01-11T22:19:56.4871853Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:19:56.4872250Z return func(*args, **kwargs) 2023-01-11T22:19:56.4872677Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py", line 171, in test_fsdp_ddp_parity_with_grad_scaler 2023-01-11T22:19:56.4873101Z self._test_fsdp_parity( 2023-01-11T22:19:56.4873465Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:19:56.4873824Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:19:56.4874364Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:19:56.4874788Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:19:56.4875149Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:19:56.4875525Z return self._bootstrap(parent_sentinel) 2023-01-11T22:19:56.4876081Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:19:56.4876478Z output = model(*input) 2023-01-11T22:19:56.4876821Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:19:56.4877165Z self.run() 2023-01-11T22:19:56.4877632Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:19:56.4877998Z return forward_call(*args, **kwargs) 2023-01-11T22:19:56.4878368Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:19:56.4878746Z self._target(*self._args, **self._kwargs) 2023-01-11T22:19:56.4879287Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:19:56.4879743Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:19:56.4880277Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:19:56.4880664Z self.run_test(test_name, pipe) 2023-01-11T22:19:56.4881174Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:19:56.4881574Z _lazy_init(state, module) 2023-01-11T22:19:56.4882091Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:19:56.4882491Z getattr(self, test_name)() 2023-01-11T22:19:56.4882984Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:19:56.4883419Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:19:56.4883972Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:19:56.4884319Z fn() 2023-01-11T22:19:56.4884840Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:19:56.4885269Z handle.init_flat_param_attributes() 2023-01-11T22:19:56.4885794Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:19:56.4886275Z test(self, **param_kwargs) 2023-01-11T22:19:56.4886782Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:19:56.4887168Z return func(*args, **kwargs) 2023-01-11T22:19:56.4887665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:19:56.4888122Z return func(*args, **kwargs) 2023-01-11T22:19:56.4888674Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:19:56.4889041Z p_assert( 2023-01-11T22:19:56.4889463Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py", line 171, in test_fsdp_ddp_parity_with_grad_scaler 2023-01-11T22:19:56.4889878Z self._test_fsdp_parity( 2023-01-11T22:19:56.4890370Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:19:56.4890739Z traceback.print_stack() 2023-01-11T22:19:56.4891259Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:19:56.4891685Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:19:56.4892228Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:19:56.4892622Z output = model(*input) 2023-01-11T22:19:56.4893766Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:19:56.4894165Z return forward_call(*args, **kwargs) 2023-01-11T22:19:56.4894702Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:19:56.4895157Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:19:56.4895730Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:19:56.4896107Z _lazy_init(state, module) 2023-01-11T22:19:56.4896613Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:19:56.4897047Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:19:56.4897640Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:19:56.4898059Z handle.init_flat_param_attributes() 2023-01-11T22:19:56.4898571Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:19:56.4898951Z return func(*args, **kwargs) 2023-01-11T22:19:56.4899466Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:19:56.4899855Z p_assert( 2023-01-11T22:19:56.4900326Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:19:56.4900704Z traceback.print_stack() 2023-01-11T22:19:56.4901078Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4901568Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4902058Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4902523Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4902879Z dist init r=1, world=2 2023-01-11T22:19:56.4903354Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:19:56.4903797Z dist init r=0, world=2 2023-01-11T22:19:56.4904381Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:19:56.4904812Z ok (3.913s) 2023-01-11T22:19:56.4905360Z test_fsdp_ddp_parity_with_grad_scaler_offload_true_shard_grad_op_mixed_precision (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51652 2023-01-11T22:19:56.4906044Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51653 2023-01-11T22:19:56.4906692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4907141Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4907719Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4908174Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4908753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4909196Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4909768Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4910218Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4910672Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:56.4911166Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:56.4911804Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4912490Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4913020Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:56.4913489Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:56.4913947Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4914429Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4915699Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4916477Z warnings.warn( 2023-01-11T22:19:56.4917638Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4918382Z warnings.warn( 2023-01-11T22:19:56.4918654Z File "", line 1, in 2023-01-11T22:19:56.4919031Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:19:56.4919388Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:19:56.4919757Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:19:56.4920130Z return self._bootstrap(parent_sentinel) 2023-01-11T22:19:56.4920600Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:19:56.4920919Z self.run() 2023-01-11T22:19:56.4921255Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:19:56.4921618Z self._target(*self._args, **self._kwargs) 2023-01-11T22:19:56.4922122Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:19:56.4922573Z self.run_test(test_name, pipe) 2023-01-11T22:19:56.4923116Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:19:56.4923518Z getattr(self, test_name)() 2023-01-11T22:19:56.4924016Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:19:56.4924383Z fn() 2023-01-11T22:19:56.4924875Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:19:56.4925258Z test(self, **param_kwargs) 2023-01-11T22:19:56.4925771Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:19:56.4926162Z return func(*args, **kwargs) 2023-01-11T22:19:56.4926426Z File "", line 1, in 2023-01-11T22:19:56.4926877Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py", line 171, in test_fsdp_ddp_parity_with_grad_scaler 2023-01-11T22:19:56.4927292Z self._test_fsdp_parity( 2023-01-11T22:19:56.4927810Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:19:56.4928212Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:19:56.4928599Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:19:56.4928973Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:19:56.4929511Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:19:56.4929903Z output = model(*input) 2023-01-11T22:19:56.4930252Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:19:56.4930622Z return self._bootstrap(parent_sentinel) 2023-01-11T22:19:56.4931115Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:19:56.4931498Z return forward_call(*args, **kwargs) 2023-01-11T22:19:56.4931877Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:19:56.4932197Z self.run() 2023-01-11T22:19:56.4932711Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:19:56.4933641Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:19:56.4934032Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:19:56.4934409Z self._target(*self._args, **self._kwargs) 2023-01-11T22:19:56.4934968Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:19:56.4935366Z _lazy_init(state, module) 2023-01-11T22:19:56.4935844Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:19:56.4936234Z self.run_test(test_name, pipe) 2023-01-11T22:19:56.4936756Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:19:56.4937170Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:19:56.4937723Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:19:56.4938114Z getattr(self, test_name)() 2023-01-11T22:19:56.4938790Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:19:56.4939204Z handle.init_flat_param_attributes() 2023-01-11T22:19:56.4939740Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:19:56.4940117Z fn() 2023-01-11T22:19:56.4940636Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:19:56.4941042Z return func(*args, **kwargs) 2023-01-11T22:19:56.4941572Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:19:56.4941970Z test(self, **param_kwargs) 2023-01-11T22:19:56.4942485Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:19:56.4942866Z p_assert( 2023-01-11T22:19:56.4943365Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:19:56.4943739Z return func(*args, **kwargs) 2023-01-11T22:19:56.4944233Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:19:56.4944610Z traceback.print_stack() 2023-01-11T22:19:56.4945041Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py", line 171, in test_fsdp_ddp_parity_with_grad_scaler 2023-01-11T22:19:56.4945460Z self._test_fsdp_parity( 2023-01-11T22:19:56.4945978Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:19:56.4946401Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:19:56.4946936Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:19:56.4947335Z output = model(*input) 2023-01-11T22:19:56.4947854Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:19:56.4948226Z return forward_call(*args, **kwargs) 2023-01-11T22:19:56.4948774Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:19:56.4949232Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:19:56.4949796Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:19:56.4950172Z _lazy_init(state, module) 2023-01-11T22:19:56.4950679Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:19:56.4951109Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:19:56.4951673Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:19:56.4952111Z handle.init_flat_param_attributes() 2023-01-11T22:19:56.4952632Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:19:56.4953014Z return func(*args, **kwargs) 2023-01-11T22:19:56.4953526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:19:56.4953908Z p_assert( 2023-01-11T22:19:56.4954380Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:19:56.4954746Z traceback.print_stack() 2023-01-11T22:19:56.4955139Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4955630Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4956113Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4956661Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4957026Z dist init r=0, world=2 2023-01-11T22:19:56.4957497Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:19:56.4957922Z dist init r=1, world=2 2023-01-11T22:19:56.4958455Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:19:56.4958894Z ok (4.012s) 2023-01-11T22:19:56.4959432Z test_fsdp_ddp_parity_with_grad_scaler_offload_true_shard_grad_op_none (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51735 2023-01-11T22:19:56.4960031Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51736 2023-01-11T22:19:56.4960657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4961115Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4961693Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4962149Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4962730Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:19:56.4963176Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:19:56.4963818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:19:56.4964285Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:19:56.4964743Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:19:56.4965237Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:19:56.4965872Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4966561Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:19:56.4967088Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:19:56.4967564Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:19:56.4968025Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4968512Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.4969792Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4970577Z warnings.warn( 2023-01-11T22:19:56.4971705Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:19:56.4972560Z warnings.warn( 2023-01-11T22:19:56.4972829Z File "", line 1, in 2023-01-11T22:19:56.4973729Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:19:56.4974089Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:19:56.4974467Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:19:56.4974843Z return self._bootstrap(parent_sentinel) 2023-01-11T22:19:56.4975332Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:19:56.4975668Z self.run() 2023-01-11T22:19:56.4976003Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:19:56.4976371Z self._target(*self._args, **self._kwargs) 2023-01-11T22:19:56.4976661Z File "", line 1, in 2023-01-11T22:19:56.4977189Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:19:56.4977594Z self.run_test(test_name, pipe) 2023-01-11T22:19:56.4978104Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:19:56.4978498Z getattr(self, test_name)() 2023-01-11T22:19:56.4978857Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:19:56.4979231Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:19:56.4979752Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:19:56.4980123Z fn() 2023-01-11T22:19:56.4980446Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:19:56.4980799Z return self._bootstrap(parent_sentinel) 2023-01-11T22:19:56.4981350Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:19:56.4981746Z test(self, **param_kwargs) 2023-01-11T22:19:56.4982099Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:19:56.4982433Z self.run() 2023-01-11T22:19:56.4982930Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:19:56.4983323Z return func(*args, **kwargs) 2023-01-11T22:19:56.4983660Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:19:56.4984036Z self._target(*self._args, **self._kwargs) 2023-01-11T22:19:56.4984493Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py", line 171, in test_fsdp_ddp_parity_with_grad_scaler 2023-01-11T22:19:56.4984892Z self._test_fsdp_parity( 2023-01-11T22:19:56.4985392Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:19:56.4985783Z self.run_test(test_name, pipe) 2023-01-11T22:19:56.4986292Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:19:56.4986715Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:19:56.4987255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:19:56.4987648Z getattr(self, test_name)() 2023-01-11T22:19:56.4988163Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:19:56.4988555Z output = model(*input) 2023-01-11T22:19:56.4989064Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:19:56.4989412Z fn() 2023-01-11T22:19:56.4989862Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:19:56.4990242Z return forward_call(*args, **kwargs) 2023-01-11T22:19:56.4990771Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:19:56.4991247Z test(self, **param_kwargs) 2023-01-11T22:19:56.4991789Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:19:56.4992248Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:19:56.4992844Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:19:56.4993253Z return func(*args, **kwargs) 2023-01-11T22:19:56.4993783Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:19:56.4994177Z _lazy_init(state, module) 2023-01-11T22:19:56.4994599Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py", line 171, in test_fsdp_ddp_parity_with_grad_scaler 2023-01-11T22:19:56.4995023Z self._test_fsdp_parity( 2023-01-11T22:19:56.4995534Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:19:56.4995950Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:19:56.4996510Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:19:56.4996932Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:19:56.4997516Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:19:56.4997927Z handle.init_flat_param_attributes() 2023-01-11T22:19:56.4998479Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:19:56.4998871Z output = model(*input) 2023-01-11T22:19:56.4999343Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:19:56.4999734Z return func(*args, **kwargs) 2023-01-11T22:19:56.5000215Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:19:56.5000601Z return forward_call(*args, **kwargs) 2023-01-11T22:19:56.5001127Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:19:56.5001515Z p_assert( 2023-01-11T22:19:56.5002024Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:19:56.5002459Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:19:56.5002989Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:19:56.5003370Z traceback.print_stack() 2023-01-11T22:19:56.5003895Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:19:56.5004276Z _lazy_init(state, module) 2023-01-11T22:19:56.5004781Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:19:56.5005213Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:19:56.5005781Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:19:56.5006214Z handle.init_flat_param_attributes() 2023-01-11T22:19:56.5006727Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:19:56.5007105Z return func(*args, **kwargs) 2023-01-11T22:19:56.5007617Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:19:56.5007996Z p_assert( 2023-01-11T22:19:56.5008563Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:19:56.5008928Z traceback.print_stack() 2023-01-11T22:19:56.5009325Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.5009813Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.5010388Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.5010864Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:19:56.5011224Z dist init r=1, world=2 2023-01-11T22:19:56.5011696Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:19:56.5012121Z dist init r=0, world=2 2023-01-11T22:19:56.5012586Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:19:56.5013294Z ok (3.912s) 2023-01-11T22:19:56.5013449Z 2023-01-11T22:19:56.5013738Z ---------------------------------------------------------------------- 2023-01-11T22:19:56.5014053Z Ran 11 tests in 33.038s 2023-01-11T22:19:56.5014224Z 2023-01-11T22:19:56.5014319Z OK 2023-01-11T22:19:56.5014453Z 2023-01-11T22:19:56.5014580Z Generating XML reports... 2023-01-11T22:19:56.5015203Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_sharded_grad_scaler/TEST-TestShardGradScaler-20230111221923.xml 2023-01-11T22:19:56.5016095Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_sharded_grad_scaler/TEST-TestShardedGradScalerParityWithDDP-20230111221923.xml 2023-01-11T22:19:56.5016519Z 2023-01-11T22:19:56.5016916Z ##[endgroup] 2023-01-11T22:19:56.5017555Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_sharded_grad_scaler (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_sharded_grad_scaler_zp4lfyi_) 2023-01-11T22:19:56.5017948Z 2023-01-11T22:19:56.5018249Z Running distributed/fsdp/test_fsdp_freezing_weights ... [2023-01-11 22:19:56.472666] 2023-01-11T22:19:56.5018961Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_freezing_weights.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:19:56.472920] 2023-01-11T22:20:34.1369990Z 2023-01-11T22:20:34.1371021Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_freezing_weights 2023-01-11T22:20:34.1372109Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_freezing_weights (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_freezing_weights_74ki4wo9) 2023-01-11T22:20:34.1372507Z 2023-01-11T22:20:34.1372625Z Running tests... 2023-01-11T22:20:34.1376317Z ---------------------------------------------------------------------- 2023-01-11T22:20:34.1377180Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_freezing_weights 2023-01-11T22:20:34.1377886Z test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False (__main__.TestFreezingWeights) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:20:34.1378698Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51853 2023-01-11T22:20:34.1379163Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51854 2023-01-11T22:20:34.1379792Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1380254Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1380838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1381518Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1382379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1382838Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1383410Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1383872Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1384454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:34.1384970Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:34.1385644Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1386621Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1387158Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:34.1387641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:34.1388120Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1388591Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1389070Z dist init r=0, world=2 2023-01-11T22:20:34.1389311Z dist init r=1, world=2 2023-01-11T22:20:34.1389555Z ok (5.851s) 2023-01-11T22:20:34.1390135Z test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51936 2023-01-11T22:20:34.1390837Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51937 2023-01-11T22:20:34.1391450Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1391902Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1392479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1392933Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1393517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1393964Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1394539Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1394987Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1395444Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:34.1395943Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:34.1396601Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1397275Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1397797Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:34.1398265Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:34.1398727Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1399208Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1400483Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:20:34.1401350Z warnings.warn( 2023-01-11T22:20:34.1402566Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:20:34.1403350Z warnings.warn( 2023-01-11T22:20:34.1403588Z dist init r=0, world=2 2023-01-11T22:20:34.1403840Z dist init r=1, world=2 2023-01-11T22:20:34.1404083Z ok (4.213s) 2023-01-11T22:20:34.1404627Z test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52019 2023-01-11T22:20:34.1405281Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52020 2023-01-11T22:20:34.1405902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1406337Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1406920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1407388Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1407965Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1408394Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1408968Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1409428Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1409885Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:34.1410361Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:34.1411017Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1411707Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1412215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:34.1412687Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:34.1413526Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1414013Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1414366Z dist init r=0, world=2 2023-01-11T22:20:34.1414623Z dist init r=1, world=2 2023-01-11T22:20:34.1414864Z ok (4.112s) 2023-01-11T22:20:34.1415407Z test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52102 2023-01-11T22:20:34.1416059Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52103 2023-01-11T22:20:34.1416800Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1417253Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1417812Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1418283Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1418942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1419401Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1419961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1420427Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1420879Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:34.1421360Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:34.1422017Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1422701Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1423225Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:34.1423678Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:34.1424148Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1424627Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1425901Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:20:34.1426654Z warnings.warn( 2023-01-11T22:20:34.1427795Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:20:34.1428564Z warnings.warn( 2023-01-11T22:20:34.1428821Z dist init r=0, world=2 2023-01-11T22:20:34.1429053Z dist init r=1, world=2 2023-01-11T22:20:34.1429291Z ok (4.112s) 2023-01-11T22:20:34.1429849Z test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52185 2023-01-11T22:20:34.1430498Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52186 2023-01-11T22:20:34.1431094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1431544Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1432118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1432586Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1433226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1433671Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1434244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1434687Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1435195Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:34.1435700Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:34.1436360Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1437027Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1437553Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:34.1438022Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:34.1438499Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1438961Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1439332Z dist init r=0, world=2 2023-01-11T22:20:34.1439587Z dist init r=1, world=2 2023-01-11T22:20:34.1439809Z ok (4.413s) 2023-01-11T22:20:34.1440369Z test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52268 2023-01-11T22:20:34.1441014Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52269 2023-01-11T22:20:34.1441628Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1442062Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1442640Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1443108Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1443670Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1444118Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1444691Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1445153Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1445586Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:34.1446084Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:34.1446736Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1447422Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1447928Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:34.1448434Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:34.1448913Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1449380Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1450646Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:20:34.1451490Z warnings.warn( 2023-01-11T22:20:34.1452696Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:20:34.1453682Z warnings.warn( 2023-01-11T22:20:34.1453923Z dist init r=0, world=2 2023-01-11T22:20:34.1454177Z dist init r=1, world=2 2023-01-11T22:20:34.1454419Z ok (4.313s) 2023-01-11T22:20:34.1454964Z test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52381 2023-01-11T22:20:34.1455614Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52382 2023-01-11T22:20:34.1456236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1456686Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1457241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1457711Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1458295Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1458743Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1459296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1459760Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1460217Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:34.1460715Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:34.1461354Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1462041Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1462567Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:34.1463021Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:34.1463501Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1463983Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1464348Z dist init r=1, world=2 2023-01-11T22:20:34.1464586Z dist init r=0, world=2 2023-01-11T22:20:34.1464830Z ok (4.112s) 2023-01-11T22:20:34.1465393Z test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52464 2023-01-11T22:20:34.1466024Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52465 2023-01-11T22:20:34.1466750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1467200Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1467774Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1468224Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1468869Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:20:34.1469327Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:20:34.1469887Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:20:34.1470353Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:20:34.1470803Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:20:34.1471302Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:20:34.1471936Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1472622Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:20:34.1473148Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:20:34.1473618Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:20:34.1474072Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1474553Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:20:34.1475822Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:20:34.1476596Z warnings.warn( 2023-01-11T22:20:34.1477756Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:20:34.1478500Z warnings.warn( 2023-01-11T22:20:34.1478753Z dist init r=0, world=2 2023-01-11T22:20:34.1479008Z dist init r=1, world=2 2023-01-11T22:20:34.1479230Z ok (4.213s) 2023-01-11T22:20:34.1479380Z 2023-01-11T22:20:34.1479656Z ---------------------------------------------------------------------- 2023-01-11T22:20:34.1479988Z Ran 8 tests in 35.340s 2023-01-11T22:20:34.1480151Z 2023-01-11T22:20:34.1480246Z OK 2023-01-11T22:20:34.1480362Z 2023-01-11T22:20:34.1480488Z Generating XML reports... 2023-01-11T22:20:34.1481118Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_freezing_weights/TEST-TestFreezingWeights-20230111221958.xml 2023-01-11T22:20:34.1481492Z 2023-01-11T22:20:34.1481883Z ##[endgroup] 2023-01-11T22:20:34.1482516Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_freezing_weights (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_freezing_weights_74ki4wo9) 2023-01-11T22:20:34.1482896Z 2023-01-11T22:20:34.1483183Z Running distributed/_composable/test_fully_shard ... [2023-01-11 22:20:34.137250] 2023-01-11T22:20:34.1484022Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_composable/test_fully_shard.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:20:34.137537] 2023-01-11T22:21:21.5408940Z 2023-01-11T22:21:21.5409436Z Expand the folded group to see the log file of distributed/_composable/test_fully_shard 2023-01-11T22:21:21.5412501Z ##[group]PRINTING LOG FILE of distributed/_composable/test_fully_shard (/var/lib/jenkins/workspace/test/test-reports/distributed-_composable-test_fully_shard_gkkc3cwq) 2023-01-11T22:21:21.5413154Z 2023-01-11T22:21:21.5413281Z Running tests... 2023-01-11T22:21:21.5413807Z ---------------------------------------------------------------------- 2023-01-11T22:21:21.5414366Z Test results will be stored in test-reports/python-unittest/distributed._composable.test_fully_shard 2023-01-11T22:21:21.5414820Z test_device_id (__main__.TestFSDPInitialization) 2023-01-11T22:21:21.5417635Z Tests passing a ``device_id``. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:21:21.5418204Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52612 2023-01-11T22:21:21.5418678Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52613 2023-01-11T22:21:21.5419371Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5419814Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5420412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5420896Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5421487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5421916Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5422501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5422974Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5423413Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:21:21.5423917Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:21:21.5424584Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5428184Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5428767Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:21:21.5429302Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:21:21.5429683Z dist init r=0, world=2 2023-01-11T22:21:21.5429995Z dist init r=1, world=2 2023-01-11T22:21:21.5430226Z ok (4.828s) 2023-01-11T22:21:21.5430552Z test_manual_fully_shard (__main__.TestFSDPInitialization) 2023-01-11T22:21:21.5431068Z Tests manually applying ``fully_shard``. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52691 2023-01-11T22:21:21.5431558Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52692 2023-01-11T22:21:21.5432208Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5432669Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5433250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5433705Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5434284Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5434956Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5435527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5435996Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5436545Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:21:21.5437061Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:21:21.5437709Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5438397Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5438925Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:21:21.5439401Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:21:21.5439741Z dist init r=0, world=2 2023-01-11T22:21:21.5439996Z dist init r=1, world=2 2023-01-11T22:21:21.5440235Z ok (3.210s) 2023-01-11T22:21:21.5440540Z test_materialize_meta_module (__main__.TestFSDPInitialization) 2023-01-11T22:21:21.5441161Z Tests materializing a meta-device module. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52770 2023-01-11T22:21:21.5441677Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52771 2023-01-11T22:21:21.5442274Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5442726Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5443302Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5443776Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5444338Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5444781Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5445354Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5445804Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5446257Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:21:21.5446754Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:21:21.5447411Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5448082Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5448600Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:21:21.5449069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:21:21.5449430Z dist init r=1, world=2 2023-01-11T22:21:21.5449665Z dist init r=0, world=2 2023-01-11T22:21:21.5449904Z ok (3.311s) 2023-01-11T22:21:21.5450237Z test_nested_fully_shard_shared_state (__main__.TestFSDPInitialization) 2023-01-11T22:21:21.5450790Z Tests that nested applications of ``fully_shard`` share the expected ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52849 2023-01-11T22:21:21.5451337Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52850 2023-01-11T22:21:21.5452041Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5452476Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5453459Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5453942Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5454718Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5455186Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5455764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5456218Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5456676Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:21:21.5457174Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:21:21.5457833Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5458506Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5459034Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:21:21.5459506Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:21:21.5459845Z dist init r=0, world=2 2023-01-11T22:21:21.5460099Z dist init r=1, world=2 2023-01-11T22:21:21.5460341Z ok (3.812s) 2023-01-11T22:21:21.5460624Z test_policy (__main__.TestFSDPInitialization) 2023-01-11T22:21:21.5461245Z Tests passing a ``policy`` for pseudo-auto-wrapping. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52928 2023-01-11T22:21:21.5461783Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52929 2023-01-11T22:21:21.5462387Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5462818Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5463396Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5463864Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5464447Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5464873Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5465452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5465922Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5466356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:21:21.5466852Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:21:21.5467514Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5468201Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5468701Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:21:21.5469170Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:21:21.5469524Z dist init r=1, world=2 2023-01-11T22:21:21.5469866Z dist init r=0, world=2 2023-01-11T22:21:21.5470110Z ok (3.310s) 2023-01-11T22:21:21.5470427Z test_sync_module_states (__main__.TestFSDPInitialization) 2023-01-11T22:21:21.5470912Z Tests passing ``sync_module_states=True``. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53007 2023-01-11T22:21:21.5471395Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53008 2023-01-11T22:21:21.5472072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5472536Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5473101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5473575Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5474155Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5474610Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5475166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5475629Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5476089Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:21:21.5476568Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:21:21.5477223Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5477911Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5478434Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:21:21.5478891Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:21:21.5479248Z dist init r=0, world=2 2023-01-11T22:21:21.5479502Z dist init r=1, world=2 2023-01-11T22:21:21.5479727Z ok (3.311s) 2023-01-11T22:21:21.5480063Z test_state_dict_save_load_flow (__main__.TestFSDPModelCheckpointing) 2023-01-11T22:21:21.5480640Z E2E test of save + load with rank0_only + CPU offload for TransformerWithSharedParams ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53086 2023-01-11T22:21:21.5481207Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53087 2023-01-11T22:21:21.5481800Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5482254Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5482829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5483301Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5483858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5484302Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5484874Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5485324Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5485779Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:21:21.5486273Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:21:21.5486928Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5487674Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5488195Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:21:21.5488665Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:21:21.5489082Z dist init r=0, world=2 2023-01-11T22:21:21.5489332Z dist init r=1, world=2 2023-01-11T22:21:21.5489576Z ok (4.714s) 2023-01-11T22:21:21.5489935Z test_state_dict_save_load_root_fully_shard (__main__.TestFSDPModelCheckpointing) 2023-01-11T22:21:21.5490470Z Tests that the full state dict saved from a module with ``fully_shard`` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53165 2023-01-11T22:21:21.5491003Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53166 2023-01-11T22:21:21.5491623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5492058Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5492639Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5493410Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5494004Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5494433Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5495005Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5495469Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5495927Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:21:21.5496408Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:21:21.5497064Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5497754Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5498256Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:21:21.5498722Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:21:21.5499070Z dist init r=1, world=2 2023-01-11T22:21:21.5499324Z dist init r=0, world=2 2023-01-11T22:21:21.5499549Z ok (3.411s) 2023-01-11T22:21:21.5499916Z test_state_dict_save_load_submodule_fully_shard (__main__.TestFSDPModelCheckpointing) 2023-01-11T22:21:21.5500478Z Tests that the full state dict saved from a module with ``fully_shard`` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53244 2023-01-11T22:21:21.5500992Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53245 2023-01-11T22:21:21.5501607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5502065Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5502639Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5503089Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5503666Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5504110Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5504777Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5505241Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5505695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:21:21.5506185Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:21:21.5506900Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5507611Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5508133Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:21:21.5508603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:21:21.5508950Z dist init r=0, world=2 2023-01-11T22:21:21.5509200Z dist init r=1, world=2 2023-01-11T22:21:21.5509441Z ok (3.411s) 2023-01-11T22:21:21.5509814Z test_optim_state_dict_save_load (__main__.TestFSDPOptimStateDict) ... skip: The test currently fails on CI. (0.001s) 2023-01-11T22:21:21.5510360Z test_optim_state_dict_submodule_fully_shard (__main__.TestFSDPOptimStateDict) ... skip: The test currently fails on CI. (0.001s) 2023-01-11T22:21:21.5510798Z test_training (__main__.TestFSDPRuntime) 2023-01-11T22:21:21.5511247Z Tests training (forward, backward, optimizer). ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53323 2023-01-11T22:21:21.5511756Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53324 2023-01-11T22:21:21.5512368Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5512820Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5513381Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5513846Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5514422Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5514870Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5515431Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5515897Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5516349Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:21:21.5516823Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:21:21.5517479Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5518160Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5518676Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:21:21.5519130Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:21:21.5519482Z dist init r=1, world=2 2023-01-11T22:21:21.5519738Z dist init r=0, world=2 2023-01-11T22:21:21.5519963Z ok (4.112s) 2023-01-11T22:21:21.5520262Z test_unshard_reshard_order (__main__.TestFSDPRuntime) 2023-01-11T22:21:21.5520763Z Tests that the unshard/reshard order matches between ``fully_shard`` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53406 2023-01-11T22:21:21.5521292Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53407 2023-01-11T22:21:21.5521992Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5522443Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5523021Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5523528Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5524125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5524573Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5525141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5525583Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5526039Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:21:21.5526533Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:21:21.5527188Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5527861Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5528382Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:21:21.5528854Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:21:21.5529191Z dist init r=0, world=2 2023-01-11T22:21:21.5529441Z dist init r=1, world=2 2023-01-11T22:21:21.5529680Z ok (3.812s) 2023-01-11T22:21:21.5530100Z test_float16_on_one_submodule (__main__.TestMixedPrecision) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53489 2023-01-11T22:21:21.5530634Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53490 2023-01-11T22:21:21.5531245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5531694Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5532253Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5532726Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5533724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:21:21.5534171Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:21:21.5534722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:21:21.5535195Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:21:21.5535737Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:21:21.5536212Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:21:21.5536869Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5537556Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:21:21.5538077Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:21:21.5538524Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:21:21.5538876Z dist init r=0, world=2 2023-01-11T22:21:21.5539241Z dist init r=1, world=2 2023-01-11T22:21:21.5539465Z ok (3.812s) 2023-01-11T22:21:21.5539615Z 2023-01-11T22:21:21.5539898Z ---------------------------------------------------------------------- 2023-01-11T22:21:21.5540234Z Ran 14 tests in 45.057s 2023-01-11T22:21:21.5540399Z 2023-01-11T22:21:21.5540512Z OK (skipped=2) 2023-01-11T22:21:21.5540648Z 2023-01-11T22:21:21.5540772Z Generating XML reports... 2023-01-11T22:21:21.5541480Z Generated XML report: test-reports/python-unittest/distributed._composable.test_fully_shard/TEST-TestFSDPInitialization-20230111222036.xml 2023-01-11T22:21:21.5542338Z Generated XML report: test-reports/python-unittest/distributed._composable.test_fully_shard/TEST-TestFSDPModelCheckpointing-20230111222036.xml 2023-01-11T22:21:21.5543122Z Generated XML report: test-reports/python-unittest/distributed._composable.test_fully_shard/TEST-TestFSDPRuntime-20230111222036.xml 2023-01-11T22:21:21.5543891Z Generated XML report: test-reports/python-unittest/distributed._composable.test_fully_shard/TEST-TestMixedPrecision-20230111222036.xml 2023-01-11T22:21:21.5544692Z Generated XML report: test-reports/python-unittest/distributed._composable.test_fully_shard/TEST-TestFSDPOptimStateDict-20230111222036.xml 2023-01-11T22:21:21.5545062Z 2023-01-11T22:21:21.5545526Z ##[endgroup] 2023-01-11T22:21:21.5546133Z FINISHED PRINTING LOG FILE of distributed/_composable/test_fully_shard (/var/lib/jenkins/workspace/test/test-reports/distributed-_composable-test_fully_shard_gkkc3cwq) 2023-01-11T22:21:21.5546503Z 2023-01-11T22:21:21.5546748Z Running distributed/test_store ... [2023-01-11 22:21:21.541019] 2023-01-11T22:21:21.5547417Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_store.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:21:21.541322] 2023-01-11T22:23:25.6582870Z 2023-01-11T22:23:25.6583546Z Expand the folded group to see the log file of distributed/test_store 2023-01-11T22:23:25.6585884Z ##[group]PRINTING LOG FILE of distributed/test_store (/var/lib/jenkins/workspace/test/test-reports/distributed-test_store_sql4hr3j) 2023-01-11T22:23:25.6587213Z , <__main__.FileStoreTest testMethod=test_init_pg_and_rpc_with_same_file>, <__main__.FileStoreTest testMethod=test_refcount>, <__main__.FileStoreTest testMethod=test_set_get>]> 2023-01-11T22:23:25.6588059Z test_compare_set (__main__.FileStoreTest) 2023-01-11T22:23:25.6588438Z test_init_pg_and_rpc_with_same_file (__main__.FileStoreTest) 2023-01-11T22:23:25.6588781Z test_refcount (__main__.FileStoreTest) 2023-01-11T22:23:25.6589083Z test_set_get (__main__.FileStoreTest) 2023-01-11T22:23:25.6589526Z , <__main__.HashStoreTest testMethod=test_set_get>]> 2023-01-11T22:23:25.6593008Z test_compare_set (__main__.HashStoreTest) 2023-01-11T22:23:25.6593854Z test_set_get (__main__.HashStoreTest) 2023-01-11T22:23:25.6594645Z , <__main__.PrefixFileStoreTest testMethod=test_set_get>]> 2023-01-11T22:23:25.6595869Z test_compare_set (__main__.PrefixFileStoreTest) 2023-01-11T22:23:25.6596369Z test_set_get (__main__.PrefixFileStoreTest) 2023-01-11T22:23:25.6596805Z ]> 2023-01-11T22:23:25.6597249Z test_get_underlying_store (__main__.PrefixStoreTest) 2023-01-11T22:23:25.6598136Z , <__main__.PrefixTCPStoreTest testMethod=test_set_get>]> 2023-01-11T22:23:25.6599155Z test_compare_set (__main__.PrefixTCPStoreTest) 2023-01-11T22:23:25.6599714Z test_set_get (__main__.PrefixTCPStoreTest) 2023-01-11T22:23:25.6600128Z ]> 2023-01-11T22:23:25.6600501Z test_set_get (__main__.PythonStoreTest) 2023-01-11T22:23:25.6601169Z ]> 2023-01-11T22:23:25.6601570Z test_nominal (__main__.RendezvousEnvTest) 2023-01-11T22:23:25.6602046Z , <__main__.RendezvousFileTest testMethod=test_nominal>]> 2023-01-11T22:23:25.6602775Z test_common_errors (__main__.RendezvousFileTest) 2023-01-11T22:23:25.6603412Z test_nominal (__main__.RendezvousFileTest) 2023-01-11T22:23:25.6604880Z , <__main__.RendezvousTCPTest testMethod=test_dns_timeout>, <__main__.RendezvousTCPTest testMethod=test_nominal>, <__main__.RendezvousTCPTest testMethod=test_tcp_store_timeout_set>]> 2023-01-11T22:23:25.6606296Z test_common_errors (__main__.RendezvousTCPTest) 2023-01-11T22:23:25.6606912Z test_dns_timeout (__main__.RendezvousTCPTest) 2023-01-11T22:23:25.6607541Z test_nominal (__main__.RendezvousTCPTest) 2023-01-11T22:23:25.6608119Z test_tcp_store_timeout_set (__main__.RendezvousTCPTest) 2023-01-11T22:23:25.6609014Z , <__main__.RendezvousTest testMethod=test_url_with_node_params>]> 2023-01-11T22:23:25.6609890Z test_unknown_handler (__main__.RendezvousTest) 2023-01-11T22:23:25.6610456Z test_url_with_node_params (__main__.RendezvousTest) 2023-01-11T22:23:25.6612428Z , <__main__.TCPStoreTest testMethod=test_compare_set>, <__main__.TCPStoreTest testMethod=test_init_pg_and_rpc_with_same_socket>, <__main__.TCPStoreTest testMethod=test_multi_worker_with_fixed_world_size>, <__main__.TCPStoreTest testMethod=test_multi_worker_with_nonfixed_world_size>, <__main__.TCPStoreTest testMethod=test_multitenancy>, <__main__.TCPStoreTest testMethod=test_numkeys_delkeys>, <__main__.TCPStoreTest testMethod=test_set_get>]> 2023-01-11T22:23:25.6614524Z test_address_already_in_use (__main__.TCPStoreTest) 2023-01-11T22:23:25.6614860Z test_compare_set (__main__.TCPStoreTest) 2023-01-11T22:23:25.6615209Z test_init_pg_and_rpc_with_same_socket (__main__.TCPStoreTest) 2023-01-11T22:23:25.6615567Z test_multi_worker_with_fixed_world_size (__main__.TCPStoreTest) 2023-01-11T22:23:25.6615946Z test_multi_worker_with_nonfixed_world_size (__main__.TCPStoreTest) 2023-01-11T22:23:25.6616299Z test_multitenancy (__main__.TCPStoreTest) 2023-01-11T22:23:25.6616609Z test_numkeys_delkeys (__main__.TCPStoreTest) 2023-01-11T22:23:25.6616922Z test_set_get (__main__.TCPStoreTest) 2023-01-11T22:23:25.6617592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6618029Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6618610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6619088Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6619319Z 2023-01-11T22:23:25.6619432Z Running tests... 2023-01-11T22:23:25.6619824Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6620347Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6620814Z test_compare_set (__main__.FileStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6621136Z ok (1.620s) 2023-01-11T22:23:25.6621287Z 2023-01-11T22:23:25.6621552Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6621883Z Ran 1 test in 1.620s 2023-01-11T22:23:25.6622045Z 2023-01-11T22:23:25.6622140Z OK 2023-01-11T22:23:25.6622256Z 2023-01-11T22:23:25.6622383Z Generating XML reports... 2023-01-11T22:23:25.6622936Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-FileStoreTest-20230111222125.xml 2023-01-11T22:23:25.6623790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6624225Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6624800Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6625270Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6625499Z 2023-01-11T22:23:25.6625691Z Running tests... 2023-01-11T22:23:25.6626104Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6626622Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6627108Z test_init_pg_and_rpc_with_same_file (__main__.FileStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6627593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:25.6628260Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:23:25.6628657Z ok (1.709s) 2023-01-11T22:23:25.6628806Z 2023-01-11T22:23:25.6629072Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6629380Z Ran 1 test in 1.709s 2023-01-11T22:23:25.6629540Z 2023-01-11T22:23:25.6629633Z OK 2023-01-11T22:23:25.6629766Z 2023-01-11T22:23:25.6629892Z Generating XML reports... 2023-01-11T22:23:25.6630431Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-FileStoreTest-20230111222129.xml 2023-01-11T22:23:25.6631104Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6631558Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6632132Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6632587Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6632818Z 2023-01-11T22:23:25.6632927Z Running tests... 2023-01-11T22:23:25.6633331Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6633831Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6634291Z test_refcount (__main__.FileStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6634627Z ok (1.603s) 2023-01-11T22:23:25.6634774Z 2023-01-11T22:23:25.6635043Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6635351Z Ran 1 test in 1.603s 2023-01-11T22:23:25.6635516Z 2023-01-11T22:23:25.6635609Z OK 2023-01-11T22:23:25.6635742Z 2023-01-11T22:23:25.6635868Z Generating XML reports... 2023-01-11T22:23:25.6636402Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-FileStoreTest-20230111222132.xml 2023-01-11T22:23:25.6637081Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6637529Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6638102Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6638551Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6638784Z 2023-01-11T22:23:25.6638894Z Running tests... 2023-01-11T22:23:25.6639296Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6639794Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6640250Z test_set_get (__main__.FileStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6640584Z ok (1.628s) 2023-01-11T22:23:25.6640732Z 2023-01-11T22:23:25.6640996Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6641389Z Ran 1 test in 1.629s 2023-01-11T22:23:25.6641552Z 2023-01-11T22:23:25.6641647Z OK 2023-01-11T22:23:25.6641781Z 2023-01-11T22:23:25.6641908Z Generating XML reports... 2023-01-11T22:23:25.6642448Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-FileStoreTest-20230111222136.xml 2023-01-11T22:23:25.6643182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6643646Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6644227Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6644679Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6644908Z 2023-01-11T22:23:25.6645021Z Running tests... 2023-01-11T22:23:25.6645426Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6645933Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6646398Z test_compare_set (__main__.HashStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6646736Z ok (1.620s) 2023-01-11T22:23:25.6646885Z 2023-01-11T22:23:25.6647149Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6647457Z Ran 1 test in 1.621s 2023-01-11T22:23:25.6647618Z 2023-01-11T22:23:25.6647715Z OK 2023-01-11T22:23:25.6647849Z 2023-01-11T22:23:25.6647973Z Generating XML reports... 2023-01-11T22:23:25.6648507Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-HashStoreTest-20230111222140.xml 2023-01-11T22:23:25.6649170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6649621Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6650193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6650654Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6650887Z 2023-01-11T22:23:25.6650999Z Running tests... 2023-01-11T22:23:25.6651405Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6651905Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6652359Z test_set_get (__main__.HashStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6652692Z ok (1.623s) 2023-01-11T22:23:25.6652838Z 2023-01-11T22:23:25.6654029Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6654349Z Ran 1 test in 1.623s 2023-01-11T22:23:25.6654512Z 2023-01-11T22:23:25.6654605Z OK 2023-01-11T22:23:25.6654737Z 2023-01-11T22:23:25.6654863Z Generating XML reports... 2023-01-11T22:23:25.6655398Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-HashStoreTest-20230111222144.xml 2023-01-11T22:23:25.6656074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6656522Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6657095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6657579Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6657811Z 2023-01-11T22:23:25.6657921Z Running tests... 2023-01-11T22:23:25.6658327Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6658824Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6659299Z test_compare_set (__main__.PrefixFileStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6659655Z ok (1.607s) 2023-01-11T22:23:25.6659924Z 2023-01-11T22:23:25.6660199Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6660507Z Ran 1 test in 1.607s 2023-01-11T22:23:25.6660669Z 2023-01-11T22:23:25.6660764Z OK 2023-01-11T22:23:25.6660897Z 2023-01-11T22:23:25.6661023Z Generating XML reports... 2023-01-11T22:23:25.6661580Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-PrefixFileStoreTest-20230111222148.xml 2023-01-11T22:23:25.6662341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6662804Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6663370Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6663842Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6664072Z 2023-01-11T22:23:25.6664190Z Running tests... 2023-01-11T22:23:25.6664596Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6665096Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6665564Z test_set_get (__main__.PrefixFileStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6665911Z ok (1.624s) 2023-01-11T22:23:25.6666058Z 2023-01-11T22:23:25.6666307Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6666635Z Ran 1 test in 1.624s 2023-01-11T22:23:25.6666796Z 2023-01-11T22:23:25.6666891Z OK 2023-01-11T22:23:25.6667022Z 2023-01-11T22:23:25.6667147Z Generating XML reports... 2023-01-11T22:23:25.6667702Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-PrefixFileStoreTest-20230111222152.xml 2023-01-11T22:23:25.6668393Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6668847Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6669401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6669873Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6670105Z 2023-01-11T22:23:25.6670214Z Running tests... 2023-01-11T22:23:25.6670620Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6671127Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6671561Z test_get_underlying_store (__main__.PrefixStoreTest) ... ok (0.003s) 2023-01-11T22:23:25.6671789Z 2023-01-11T22:23:25.6672053Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6672353Z Ran 1 test in 0.003s 2023-01-11T22:23:25.6672515Z 2023-01-11T22:23:25.6672609Z OK 2023-01-11T22:23:25.6672742Z 2023-01-11T22:23:25.6672868Z Generating XML reports... 2023-01-11T22:23:25.6673432Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-PrefixStoreTest-20230111222155.xml 2023-01-11T22:23:25.6674088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6674531Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6675112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6675584Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6675795Z 2023-01-11T22:23:25.6675905Z Running tests... 2023-01-11T22:23:25.6676309Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6676828Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6677285Z test_compare_set (__main__.PrefixTCPStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6677731Z ok (1.617s) 2023-01-11T22:23:25.6677879Z 2023-01-11T22:23:25.6678157Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6678460Z Ran 1 test in 1.618s 2023-01-11T22:23:25.6678621Z 2023-01-11T22:23:25.6678716Z OK 2023-01-11T22:23:25.6678852Z 2023-01-11T22:23:25.6678976Z Generating XML reports... 2023-01-11T22:23:25.6679608Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-PrefixTCPStoreTest-20230111222158.xml 2023-01-11T22:23:25.6680291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6680740Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6681317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6681764Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6682000Z 2023-01-11T22:23:25.6682109Z Running tests... 2023-01-11T22:23:25.6682515Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6683187Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6683639Z test_set_get (__main__.PrefixTCPStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6683985Z ok (1.622s) 2023-01-11T22:23:25.6684136Z 2023-01-11T22:23:25.6684404Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6684716Z Ran 1 test in 1.623s 2023-01-11T22:23:25.6684878Z 2023-01-11T22:23:25.6684972Z OK 2023-01-11T22:23:25.6685107Z 2023-01-11T22:23:25.6685232Z Generating XML reports... 2023-01-11T22:23:25.6685801Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-PrefixTCPStoreTest-20230111222201.xml 2023-01-11T22:23:25.6686470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6686926Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6687503Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6687974Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6688186Z 2023-01-11T22:23:25.6688298Z Running tests... 2023-01-11T22:23:25.6688705Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6689225Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6689664Z test_set_get (__main__.PythonStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6689999Z ok (1.600s) 2023-01-11T22:23:25.6690148Z 2023-01-11T22:23:25.6690415Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6690719Z Ran 1 test in 1.600s 2023-01-11T22:23:25.6690889Z 2023-01-11T22:23:25.6690983Z OK 2023-01-11T22:23:25.6691117Z 2023-01-11T22:23:25.6691242Z Generating XML reports... 2023-01-11T22:23:25.6691801Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-PythonStoreTest-20230111222205.xml 2023-01-11T22:23:25.6692458Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6693437Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6694286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6694741Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6694972Z 2023-01-11T22:23:25.6695081Z Running tests... 2023-01-11T22:23:25.6695485Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6696004Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6696569Z test_nominal (__main__.RendezvousEnvTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6696910Z ok (1.647s) 2023-01-11T22:23:25.6697060Z 2023-01-11T22:23:25.6697332Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6697638Z Ran 1 test in 1.647s 2023-01-11T22:23:25.6697799Z 2023-01-11T22:23:25.6697893Z OK 2023-01-11T22:23:25.6698026Z 2023-01-11T22:23:25.6698223Z Generating XML reports... 2023-01-11T22:23:25.6698806Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-RendezvousEnvTest-20230111222209.xml 2023-01-11T22:23:25.6699466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6699915Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6700488Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6700946Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6701177Z 2023-01-11T22:23:25.6701289Z Running tests... 2023-01-11T22:23:25.6701691Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6702209Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6702666Z test_common_errors (__main__.RendezvousFileTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6703016Z ok (1.618s) 2023-01-11T22:23:25.6703167Z 2023-01-11T22:23:25.6703431Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6703736Z Ran 1 test in 1.618s 2023-01-11T22:23:25.6703898Z 2023-01-11T22:23:25.6703990Z OK 2023-01-11T22:23:25.6704126Z 2023-01-11T22:23:25.6704250Z Generating XML reports... 2023-01-11T22:23:25.6704817Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-RendezvousFileTest-20230111222213.xml 2023-01-11T22:23:25.6705487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6705934Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6706506Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6706955Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6707188Z 2023-01-11T22:23:25.6707299Z Running tests... 2023-01-11T22:23:25.6707701Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6708219Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6708664Z test_nominal (__main__.RendezvousFileTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6709009Z ok (1.629s) 2023-01-11T22:23:25.6709158Z 2023-01-11T22:23:25.6709423Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6709736Z Ran 1 test in 1.630s 2023-01-11T22:23:25.6709897Z 2023-01-11T22:23:25.6709990Z OK 2023-01-11T22:23:25.6710124Z 2023-01-11T22:23:25.6710248Z Generating XML reports... 2023-01-11T22:23:25.6710798Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-RendezvousFileTest-20230111222217.xml 2023-01-11T22:23:25.6711484Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6711934Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6712510Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6712960Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6713191Z 2023-01-11T22:23:25.6713301Z Running tests... 2023-01-11T22:23:25.6713702Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6714329Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6714784Z test_common_errors (__main__.RendezvousTCPTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6715132Z ok (1.621s) 2023-01-11T22:23:25.6715281Z 2023-01-11T22:23:25.6715545Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6715850Z Ran 1 test in 1.621s 2023-01-11T22:23:25.6716069Z 2023-01-11T22:23:25.6716171Z OK 2023-01-11T22:23:25.6716308Z 2023-01-11T22:23:25.6716438Z Generating XML reports... 2023-01-11T22:23:25.6716993Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-RendezvousTCPTest-20230111222221.xml 2023-01-11T22:23:25.6717675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6718128Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6718712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6719166Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6719401Z 2023-01-11T22:23:25.6719510Z Running tests... 2023-01-11T22:23:25.6719912Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6720432Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6720887Z test_dns_timeout (__main__.RendezvousTCPTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6721536Z [W socket.cpp:601] [c10d] The IPv6 network addresses of (dnsnotexist, 23456) cannot be retrieved (gai error: -2 - Name or service not known). 2023-01-11T22:23:25.6722071Z [E socket.cpp:860] [c10d] The client socket has timed out after 1s while trying to connect to (dnsnotexist, 23456). 2023-01-11T22:23:25.6722407Z ok (1.620s) 2023-01-11T22:23:25.6722561Z 2023-01-11T22:23:25.6722830Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6723156Z Ran 1 test in 1.620s 2023-01-11T22:23:25.6723317Z 2023-01-11T22:23:25.6723412Z OK 2023-01-11T22:23:25.6723528Z 2023-01-11T22:23:25.6723652Z Generating XML reports... 2023-01-11T22:23:25.6724212Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-RendezvousTCPTest-20230111222225.xml 2023-01-11T22:23:25.6724894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6725326Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6725900Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6726366Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6726594Z 2023-01-11T22:23:25.6726703Z Running tests... 2023-01-11T22:23:25.6727094Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6727608Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6728072Z test_nominal (__main__.RendezvousTCPTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6728395Z ok (1.626s) 2023-01-11T22:23:25.6728545Z 2023-01-11T22:23:25.6728815Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6729145Z Ran 1 test in 1.626s 2023-01-11T22:23:25.6729307Z 2023-01-11T22:23:25.6729402Z OK 2023-01-11T22:23:25.6729517Z 2023-01-11T22:23:25.6729641Z Generating XML reports... 2023-01-11T22:23:25.6730203Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-RendezvousTCPTest-20230111222228.xml 2023-01-11T22:23:25.6730886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6731315Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6731994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6732461Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6732690Z 2023-01-11T22:23:25.6732804Z Running tests... 2023-01-11T22:23:25.6733920Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6734557Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6735061Z test_tcp_store_timeout_set (__main__.RendezvousTCPTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6735401Z ok (11.709s) 2023-01-11T22:23:25.6735552Z 2023-01-11T22:23:25.6735829Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6736156Z Ran 1 test in 11.709s 2023-01-11T22:23:25.6736318Z 2023-01-11T22:23:25.6736413Z OK 2023-01-11T22:23:25.6736535Z 2023-01-11T22:23:25.6736658Z Generating XML reports... 2023-01-11T22:23:25.6737227Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-RendezvousTCPTest-20230111222232.xml 2023-01-11T22:23:25.6737910Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6738339Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6738914Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6739382Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6739610Z 2023-01-11T22:23:25.6739721Z Running tests... 2023-01-11T22:23:25.6740105Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6740625Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6741094Z test_unknown_handler (__main__.RendezvousTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6741423Z ok (1.616s) 2023-01-11T22:23:25.6741567Z 2023-01-11T22:23:25.6741830Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6742155Z Ran 1 test in 1.617s 2023-01-11T22:23:25.6742317Z 2023-01-11T22:23:25.6742393Z OK 2023-01-11T22:23:25.6742529Z 2023-01-11T22:23:25.6742653Z Generating XML reports... 2023-01-11T22:23:25.6743209Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-RendezvousTest-20230111222246.xml 2023-01-11T22:23:25.6743880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6744311Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6744884Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6745353Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6745588Z 2023-01-11T22:23:25.6745697Z Running tests... 2023-01-11T22:23:25.6746084Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6746599Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6747073Z test_url_with_node_params (__main__.RendezvousTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6747402Z ok (1.618s) 2023-01-11T22:23:25.6747552Z 2023-01-11T22:23:25.6747818Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6748144Z Ran 1 test in 1.618s 2023-01-11T22:23:25.6748305Z 2023-01-11T22:23:25.6748380Z OK 2023-01-11T22:23:25.6748517Z 2023-01-11T22:23:25.6748642Z Generating XML reports... 2023-01-11T22:23:25.6749194Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-RendezvousTest-20230111222250.xml 2023-01-11T22:23:25.6749863Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6750402Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6750988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6751458Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6751688Z 2023-01-11T22:23:25.6751858Z Running tests... 2023-01-11T22:23:25.6752260Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6752780Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6753259Z test_address_already_in_use (__main__.TCPStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6753850Z [W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:46841 (errno: 98 - Address already in use). 2023-01-11T22:23:25.6754453Z [W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:46841 (errno: 98 - Address already in use). 2023-01-11T22:23:25.6754913Z [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address. 2023-01-11T22:23:25.6755248Z ok (1.628s) 2023-01-11T22:23:25.6755379Z 2023-01-11T22:23:25.6755643Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6755970Z Ran 1 test in 1.629s 2023-01-11T22:23:25.6756131Z 2023-01-11T22:23:25.6756229Z OK 2023-01-11T22:23:25.6756362Z 2023-01-11T22:23:25.6756469Z Generating XML reports... 2023-01-11T22:23:25.6757018Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222254.xml 2023-01-11T22:23:25.6757684Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6758168Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6758728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6759205Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6759437Z 2023-01-11T22:23:25.6759551Z Running tests... 2023-01-11T22:23:25.6759936Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6760452Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6760915Z test_compare_set (__main__.TCPStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6761233Z ok (1.629s) 2023-01-11T22:23:25.6761382Z 2023-01-11T22:23:25.6761650Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6761977Z Ran 1 test in 1.629s 2023-01-11T22:23:25.6762138Z 2023-01-11T22:23:25.6762232Z OK 2023-01-11T22:23:25.6762346Z 2023-01-11T22:23:25.6762472Z Generating XML reports... 2023-01-11T22:23:25.6763015Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222258.xml 2023-01-11T22:23:25.6763690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6764121Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6764698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6765170Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6765402Z 2023-01-11T22:23:25.6765512Z Running tests... 2023-01-11T22:23:25.6765896Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6766411Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6766905Z test_init_pg_and_rpc_with_same_socket (__main__.TCPStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6767477Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:23:25.6768148Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:23:25.6768545Z ok (1.684s) 2023-01-11T22:23:25.6768694Z 2023-01-11T22:23:25.6768960Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6769270Z Ran 1 test in 1.685s 2023-01-11T22:23:25.6769491Z 2023-01-11T22:23:25.6769591Z OK 2023-01-11T22:23:25.6769728Z 2023-01-11T22:23:25.6769853Z Generating XML reports... 2023-01-11T22:23:25.6770390Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222302.xml 2023-01-11T22:23:25.6771060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6771508Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6772093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6772545Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6772773Z 2023-01-11T22:23:25.6773336Z Running tests... 2023-01-11T22:23:25.6773835Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6774360Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6774834Z test_multi_worker_with_fixed_world_size (__main__.TCPStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6775199Z ok (1.625s) 2023-01-11T22:23:25.6775349Z 2023-01-11T22:23:25.6775615Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6775925Z Ran 1 test in 1.625s 2023-01-11T22:23:25.6776087Z 2023-01-11T22:23:25.6776181Z OK 2023-01-11T22:23:25.6776316Z 2023-01-11T22:23:25.6776440Z Generating XML reports... 2023-01-11T22:23:25.6776973Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222305.xml 2023-01-11T22:23:25.6777642Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6778091Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6778669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6779121Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6779351Z 2023-01-11T22:23:25.6779462Z Running tests... 2023-01-11T22:23:25.6779866Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6780365Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6780864Z test_multi_worker_with_nonfixed_world_size (__main__.TCPStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6781237Z ok (1.632s) 2023-01-11T22:23:25.6781387Z 2023-01-11T22:23:25.6781651Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6781960Z Ran 1 test in 1.632s 2023-01-11T22:23:25.6782119Z 2023-01-11T22:23:25.6782214Z OK 2023-01-11T22:23:25.6782348Z 2023-01-11T22:23:25.6782474Z Generating XML reports... 2023-01-11T22:23:25.6783005Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222309.xml 2023-01-11T22:23:25.6783670Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6784118Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6784694Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6785151Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6785496Z 2023-01-11T22:23:25.6785608Z Running tests... 2023-01-11T22:23:25.6786019Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6786521Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6786989Z test_multitenancy (__main__.TCPStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6787333Z ok (1.628s) 2023-01-11T22:23:25.6787484Z 2023-01-11T22:23:25.6787818Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6788142Z Ran 1 test in 1.628s 2023-01-11T22:23:25.6788305Z 2023-01-11T22:23:25.6788400Z OK 2023-01-11T22:23:25.6788535Z 2023-01-11T22:23:25.6788662Z Generating XML reports... 2023-01-11T22:23:25.6789199Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222313.xml 2023-01-11T22:23:25.6789869Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6790324Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6790896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6791347Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6791577Z 2023-01-11T22:23:25.6791687Z Running tests... 2023-01-11T22:23:25.6792097Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6792602Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6793070Z test_numkeys_delkeys (__main__.TCPStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6793416Z ok (3.646s) 2023-01-11T22:23:25.6793564Z 2023-01-11T22:23:25.6793830Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6794133Z Ran 1 test in 3.647s 2023-01-11T22:23:25.6794298Z 2023-01-11T22:23:25.6794391Z OK 2023-01-11T22:23:25.6794524Z 2023-01-11T22:23:25.6794648Z Generating XML reports... 2023-01-11T22:23:25.6795175Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222317.xml 2023-01-11T22:23:25.6795843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:23:25.6796295Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:23:25.6796872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:23:25.6797325Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:23:25.6797552Z 2023-01-11T22:23:25.6797662Z Running tests... 2023-01-11T22:23:25.6798066Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6798566Z Test results will be stored in test-reports/python-unittest/distributed.test_store 2023-01-11T22:23:25.6799028Z test_set_get (__main__.TCPStoreTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:23:25.6799355Z ok (1.626s) 2023-01-11T22:23:25.6799509Z 2023-01-11T22:23:25.6799770Z ---------------------------------------------------------------------- 2023-01-11T22:23:25.6800075Z Ran 1 test in 1.626s 2023-01-11T22:23:25.6800239Z 2023-01-11T22:23:25.6800333Z OK 2023-01-11T22:23:25.6800465Z 2023-01-11T22:23:25.6800590Z Generating XML reports... 2023-01-11T22:23:25.6801118Z Generated XML report: test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222323.xml 2023-01-11T22:23:25.6801433Z 2023-01-11T22:23:25.6801825Z ##[endgroup] 2023-01-11T22:23:25.6802375Z FINISHED PRINTING LOG FILE of distributed/test_store (/var/lib/jenkins/workspace/test/test-reports/distributed-test_store_sql4hr3j) 2023-01-11T22:23:25.6802695Z 2023-01-11T22:23:25.6802942Z Running distributed/fsdp/test_fsdp_misc ... [2023-01-11 22:23:25.658840] 2023-01-11T22:23:25.6803703Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_misc.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:23:25.659148] 2023-01-11T22:24:31.7198013Z 2023-01-11T22:24:31.7198773Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_misc 2023-01-11T22:24:31.7199714Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_misc (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_misc_eb0oq710) 2023-01-11T22:24:31.7201606Z 2023-01-11T22:24:31.7202153Z Running tests... 2023-01-11T22:24:31.7202800Z ---------------------------------------------------------------------- 2023-01-11T22:24:31.7203379Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_misc 2023-01-11T22:24:31.7203849Z test_cpu_init_with_sync_module_states (__main__.TestFSDPMisc) 2023-01-11T22:24:31.7204293Z Tests that passing ``sync_module_states=True`` raises an error for ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:24:31.7204799Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54752 2023-01-11T22:24:31.7205259Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54753 2023-01-11T22:24:31.7205892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7206331Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7206901Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7207353Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7207919Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7208397Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7208993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7209477Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7209921Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7210428Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7211099Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7211800Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7212296Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7212771Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7214437Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:24:31.7215305Z warnings.warn( 2023-01-11T22:24:31.7216471Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:24:31.7217273Z warnings.warn( 2023-01-11T22:24:31.7217735Z dist init r=0, world=2 2023-01-11T22:24:31.7217991Z dist init r=1, world=2 2023-01-11T22:24:31.7218244Z ok (4.940s) 2023-01-11T22:24:31.7218518Z test_device_id_auto_wrap (__main__.TestFSDPMisc) 2023-01-11T22:24:31.7219010Z Tests that ``auto_wrap_policy`` propagates ``device_id`` to all ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54831 2023-01-11T22:24:31.7219637Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54832 2023-01-11T22:24:31.7220293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7220730Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7221325Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7221804Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7222399Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7222838Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7223419Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7223889Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7224332Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7224831Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7225489Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7226182Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7226688Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7227161Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7227521Z dist init r=1, world=2 2023-01-11T22:24:31.7227777Z dist init r=0, world=2 2023-01-11T22:24:31.7228001Z ok (3.411s) 2023-01-11T22:24:31.7228306Z test_fsdp_cpu_init_stays_on_cpu (__main__.TestFSDPMisc) 2023-01-11T22:24:31.7228980Z Tests that passing a CPU module to FSDP preserves that the wrapped ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54910 2023-01-11T22:24:31.7229503Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54911 2023-01-11T22:24:31.7230118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7230572Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7231153Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7231607Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7232186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7232632Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7233191Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7233659Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7234116Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7234614Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7235256Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7236043Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7236567Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7237038Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7237437Z dist init r=1, world=2 2023-01-11T22:24:31.7237703Z dist init r=0, world=2 2023-01-11T22:24:31.7237948Z ok (3.712s) 2023-01-11T22:24:31.7238234Z test_fsdp_device_id_cpu_offload (__main__.TestFSDPMisc) 2023-01-11T22:24:31.7238727Z Ensures that even if device_id is specified but we have ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54993 2023-01-11T22:24:31.7239250Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54994 2023-01-11T22:24:31.7239862Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7240312Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7240888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7241362Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7241927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7242376Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7242949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7243418Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7243854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7244356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7245015Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7245689Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7246213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7246683Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7247040Z dist init r=0, world=2 2023-01-11T22:24:31.7247277Z dist init r=1, world=2 2023-01-11T22:24:31.7247523Z ok (3.311s) 2023-01-11T22:24:31.7247834Z test_fsdp_device_id_use_index_False (__main__.TestFSDPMisc) 2023-01-11T22:24:31.7248291Z Tests the FSDP ``device_id`` argument: ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55072 2023-01-11T22:24:31.7248799Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55073 2023-01-11T22:24:31.7249414Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7249865Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7250427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7250898Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7251479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7251904Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7252482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7253502Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7253965Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7254443Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7255200Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7255914Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7256417Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7256888Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7257240Z dist init r=1, world=2 2023-01-11T22:24:31.7257502Z dist init r=0, world=2 2023-01-11T22:24:31.7257730Z ok (3.311s) 2023-01-11T22:24:31.7258042Z test_fsdp_device_id_use_index_True (__main__.TestFSDPMisc) 2023-01-11T22:24:31.7258514Z Tests the FSDP ``device_id`` argument: ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55151 2023-01-11T22:24:31.7258994Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55152 2023-01-11T22:24:31.7259614Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7260071Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7260648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7261133Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7261715Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7262166Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7262724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7263188Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7263645Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7264141Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7264777Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7265466Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7265989Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7266470Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7266811Z dist init r=0, world=2 2023-01-11T22:24:31.7267073Z dist init r=1, world=2 2023-01-11T22:24:31.7267320Z ok (3.312s) 2023-01-11T22:24:31.7267791Z test_fsdp_module_no_compute_grad_use_second_layer_False_sharding_strategy_None (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55230 2023-01-11T22:24:31.7268374Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55231 2023-01-11T22:24:31.7268985Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7269437Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7269996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7270574Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7271168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7271595Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7272223Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7272699Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7273156Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7273632Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7274291Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7274986Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7275509Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7275963Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7276317Z dist init r=0, world=2 2023-01-11T22:24:31.7276575Z dist init r=1, world=2 2023-01-11T22:24:31.7276805Z ok (3.812s) 2023-01-11T22:24:31.7277330Z test_fsdp_module_no_compute_grad_use_second_layer_False_sharding_strategy_ShardingStrategy_NO_SHARD (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55313 2023-01-11T22:24:31.7277940Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55314 2023-01-11T22:24:31.7278532Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7278989Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7279564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7280030Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7280589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7281043Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7281619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7282083Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7282517Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7283009Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7283670Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7284335Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7284856Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7285333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7285691Z dist init r=0, world=2 2023-01-11T22:24:31.7285927Z dist init r=1, world=2 2023-01-11T22:24:31.7286171Z ok (3.712s) 2023-01-11T22:24:31.7286660Z test_fsdp_module_no_compute_grad_use_second_layer_True_sharding_strategy_None (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55396 2023-01-11T22:24:31.7287221Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55397 2023-01-11T22:24:31.7287914Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7288361Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7288938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7289445Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7290039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7290485Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7291055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7291503Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7291962Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7292456Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7293639Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7294346Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7294869Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7295341Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7295683Z dist init r=1, world=2 2023-01-11T22:24:31.7295941Z dist init r=0, world=2 2023-01-11T22:24:31.7296186Z ok (3.814s) 2023-01-11T22:24:31.7296687Z test_fsdp_module_no_compute_grad_use_second_layer_True_sharding_strategy_ShardingStrategy_NO_SHARD (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55479 2023-01-11T22:24:31.7297301Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55480 2023-01-11T22:24:31.7297917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7298370Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7298917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7299369Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7299943Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7300392Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7300984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7301453Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7301913Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7302390Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7303052Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7303742Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7304266Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7304718Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7305176Z dist init r=0, world=2 2023-01-11T22:24:31.7305434Z dist init r=1, world=2 2023-01-11T22:24:31.7305662Z ok (3.812s) 2023-01-11T22:24:31.7306082Z test_fsdp_namedtuple (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55562 2023-01-11T22:24:31.7306590Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55563 2023-01-11T22:24:31.7307287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7307733Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7308322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7308792Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7309352Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7309802Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7310377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7310848Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7311289Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7311788Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7312446Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7313114Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7313631Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7314107Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7314465Z dist init r=1, world=2 2023-01-11T22:24:31.7314703Z dist init r=0, world=2 2023-01-11T22:24:31.7314946Z ok (3.311s) 2023-01-11T22:24:31.7315381Z test_fsdp_not_all_outputs_used_in_loss (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55641 2023-01-11T22:24:31.7315891Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55642 2023-01-11T22:24:31.7316499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7316946Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7317525Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7317979Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7318560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7319004Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7319579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7320030Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7320487Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7320984Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7321624Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7322307Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7322922Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7323397Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7324251Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_misc.py:113: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:24:31.7324987Z self.assertEqual(full_param.storage().size(), 0) 2023-01-11T22:24:31.7325742Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_misc.py:113: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:24:31.7326467Z self.assertEqual(full_param.storage().size(), 0) 2023-01-11T22:24:31.7326771Z dist init r=0, world=2 2023-01-11T22:24:31.7327005Z dist init r=1, world=2 2023-01-11T22:24:31.7327246Z ok (3.814s) 2023-01-11T22:24:31.7327556Z test_fsdp_same_model_across_ranks (__main__.TestFSDPMisc) 2023-01-11T22:24:31.7328055Z FSDP broadcasts model from rank 0 to ensure it starts off with the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55724 2023-01-11T22:24:31.7328593Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55725 2023-01-11T22:24:31.7329215Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7329649Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7330233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7330702Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7331281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7331707Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7332285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7332752Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7333586Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7334064Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7334730Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7335429Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7335933Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7336408Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7336762Z dist init r=0, world=2 2023-01-11T22:24:31.7337019Z dist init r=1, world=2 2023-01-11T22:24:31.7337244Z ok (3.311s) 2023-01-11T22:24:31.7337548Z test_homogeneous_attributes (__main__.TestFSDPMisc) 2023-01-11T22:24:31.7338067Z Tests that passing heterogeneous values for attributes designated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55803 2023-01-11T22:24:31.7338588Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55804 2023-01-11T22:24:31.7339316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7339767Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7340346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7340798Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7341471Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7341935Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7342501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7342967Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7343426Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7343930Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7344566Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7345257Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7345780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7346250Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7346591Z dist init r=1, world=2 2023-01-11T22:24:31.7346850Z dist init r=0, world=2 2023-01-11T22:24:31.7347096Z ok (3.210s) 2023-01-11T22:24:31.7347394Z test_module_device_mismatches_device_id (__main__.TestFSDPMisc) 2023-01-11T22:24:31.7347916Z Tests that specifying a ``device_id`` argument to FSDP for a GPU ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55882 2023-01-11T22:24:31.7348449Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55883 2023-01-11T22:24:31.7349045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7349493Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7350075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7350545Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7351109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7351560Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7352136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7352602Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7353035Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7353532Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7354187Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7354858Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7355381Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7355853Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7356290Z dist init r=0, world=2 2023-01-11T22:24:31.7356527Z dist init r=1, world=2 2023-01-11T22:24:31.7356771Z ok (3.310s) 2023-01-11T22:24:31.7357081Z test_multi_device_not_supported (__main__.TestFSDPMisc) 2023-01-11T22:24:31.7357717Z Tests that wrapping a multi-device module (i.e. with submodules on ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55961 2023-01-11T22:24:31.7358313Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55962 2023-01-11T22:24:31.7358937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7359388Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7359942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7360413Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7360998Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7361457Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7362032Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7362497Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7362957Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7363432Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7364084Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7364773Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7365301Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7365754Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7366104Z dist init r=1, world=2 2023-01-11T22:24:31.7366359Z dist init r=0, world=2 2023-01-11T22:24:31.7366584Z ok (3.311s) 2023-01-11T22:24:31.7366859Z test_no_params (__main__.TestFSDPMisc) 2023-01-11T22:24:31.7367336Z Test that device_id and cpu init work if module has no params ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56040 2023-01-11T22:24:31.7367843Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56041 2023-01-11T22:24:31.7368452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7368899Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7369481Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7369930Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7370507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7370952Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7371509Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7371976Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7372431Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:24:31.7373347Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7374001Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7374800Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:24:31.7375323Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:24:31.7375795Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7376203Z dist init r=0, world=2 2023-01-11T22:24:31.7376470Z dist init r=1, world=2 2023-01-11T22:24:31.7376717Z ok (3.210s) 2023-01-11T22:24:31.7377047Z test_world_size_1_sharding_strategy_warning (__main__.TestFSDPMiscWorldSize1) 2023-01-11T22:24:31.7377591Z Tests that FSDP issues a warning when it switches to using ``NO_SHARD`` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56119 2023-01-11T22:24:31.7378295Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:24:31.7378754Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:24:31.7379310Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:24:31.7379781Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:24:31.7380237Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:24:31.7380873Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:24:31.7381395Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:24:31.7381753Z dist init r=0, world=1 2023-01-11T22:24:31.7381997Z ok (3.109s) 2023-01-11T22:24:31.7382128Z 2023-01-11T22:24:31.7382400Z ---------------------------------------------------------------------- 2023-01-11T22:24:31.7382741Z Ran 18 tests in 63.735s 2023-01-11T22:24:31.7382906Z 2023-01-11T22:24:31.7383004Z OK 2023-01-11T22:24:31.7383138Z 2023-01-11T22:24:31.7383246Z Generating XML reports... 2023-01-11T22:24:31.7383821Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMisc-20230111222327.xml 2023-01-11T22:24:31.7384582Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMiscWorldSize1-20230111222327.xml 2023-01-11T22:24:31.7384942Z 2023-01-11T22:24:31.7385350Z ##[endgroup] 2023-01-11T22:24:31.7385927Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_misc (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_misc_eb0oq710) 2023-01-11T22:24:31.7386273Z 2023-01-11T22:24:31.7386554Z Running distributed/fsdp/test_fsdp_checkpoint ... [2023-01-11 22:24:31.720022] 2023-01-11T22:24:31.7387251Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_checkpoint.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:24:31.720273] 2023-01-11T22:25:41.2355940Z 2023-01-11T22:25:41.2357208Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_checkpoint 2023-01-11T22:25:41.2378461Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_checkpoint_kn57v2en) 2023-01-11T22:25:41.2379161Z 2023-01-11T22:25:41.2379332Z Running tests... 2023-01-11T22:25:41.2380220Z ---------------------------------------------------------------------- 2023-01-11T22:25:41.2381291Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_checkpoint 2023-01-11T22:25:41.2382563Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=False)_offload_activations_False_use_orig_params_False (__main__.TestFSDPCheckpoint) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:25:41.2383724Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56194 2023-01-11T22:25:41.2384941Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56195 2023-01-11T22:25:41.2386119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2387002Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2388345Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2389332Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2390451Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2391266Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2392455Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2393682Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2394569Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2395539Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2396913Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2398337Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2399411Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2400383Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2401077Z dist init r=1, world=2 2023-01-11T22:25:41.2401561Z dist init r=0, world=2 2023-01-11T22:25:41.2402027Z ok (5.426s) 2023-01-11T22:25:41.2403154Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=False)_offload_activations_False_use_orig_params_True (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56277 2023-01-11T22:25:41.2404512Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56278 2023-01-11T22:25:41.2405754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2406668Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2407819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2408781Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2409951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2410859Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2412003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2413338Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2414275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2415242Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2416563Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2417992Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2419029Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2420213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2420911Z dist init r=0, world=2 2023-01-11T22:25:41.2421400Z dist init r=1, world=2 2023-01-11T22:25:41.2421864Z ok (3.913s) 2023-01-11T22:25:41.2423097Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=False)_offload_activations_True_use_orig_params_False (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56360 2023-01-11T22:25:41.2424468Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56361 2023-01-11T22:25:41.2425730Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2426638Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2427781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2428738Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2429911Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2430796Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2431960Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2432913Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2433826Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2434781Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2436101Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2437499Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2438574Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2439542Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2440243Z dist init r=0, world=2 2023-01-11T22:25:41.2440726Z dist init r=1, world=2 2023-01-11T22:25:41.2441192Z ok (3.913s) 2023-01-11T22:25:41.2442317Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=False)_offload_activations_True_use_orig_params_True (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56443 2023-01-11T22:25:41.2443650Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56444 2023-01-11T22:25:41.2444886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2445800Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2446944Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2447897Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2449084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2449966Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2451116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2452050Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2453384Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2454383Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2455736Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2456986Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2458118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2459052Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2459659Z dist init r=1, world=2 2023-01-11T22:25:41.2460113Z dist init r=0, world=2 2023-01-11T22:25:41.2460525Z ok (3.913s) 2023-01-11T22:25:41.2461510Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=True)_offload_activations_False_use_orig_params_False (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56526 2023-01-11T22:25:41.2462689Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56527 2023-01-11T22:25:41.2463501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2463968Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2464535Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2465050Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2465643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2466073Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2466653Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2467127Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2467585Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2468061Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2468713Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2469407Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2469933Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2470387Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2470737Z dist init r=0, world=2 2023-01-11T22:25:41.2470992Z dist init r=1, world=2 2023-01-11T22:25:41.2471222Z ok (3.913s) 2023-01-11T22:25:41.2471786Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=True)_offload_activations_False_use_orig_params_True (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56609 2023-01-11T22:25:41.2472434Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56610 2023-01-11T22:25:41.2473047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2473485Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2474063Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2474532Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2475094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2475679Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2476262Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2476733Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2477171Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2477730Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2478404Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2479096Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2479601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2480079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2480434Z dist init r=0, world=2 2023-01-11T22:25:41.2480671Z dist init r=1, world=2 2023-01-11T22:25:41.2480913Z ok (3.913s) 2023-01-11T22:25:41.2481482Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=True)_offload_activations_True_use_orig_params_False (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56692 2023-01-11T22:25:41.2482134Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56693 2023-01-11T22:25:41.2482727Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2483386Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2483992Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2484455Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2485039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2485485Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2486056Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2486507Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2486968Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2487464Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2488121Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2488795Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2489319Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2489792Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2490130Z dist init r=1, world=2 2023-01-11T22:25:41.2490384Z dist init r=0, world=2 2023-01-11T22:25:41.2490630Z ok (3.913s) 2023-01-11T22:25:41.2491194Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=True)_offload_activations_True_use_orig_params_True (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56775 2023-01-11T22:25:41.2491825Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56776 2023-01-11T22:25:41.2492438Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2493745Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2494812Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2495657Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2497094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2497594Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2498169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2498647Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2499103Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2499605Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2500252Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2500940Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2501469Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2501923Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2502289Z dist init r=0, world=2 2023-01-11T22:25:41.2502544Z dist init r=1, world=2 2023-01-11T22:25:41.2502787Z ok (3.913s) 2023-01-11T22:25:41.2503338Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=False)_offload_activations_False_use_orig_params_False (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56858 2023-01-11T22:25:41.2503999Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56859 2023-01-11T22:25:41.2504616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2505054Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2505635Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2506107Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2506690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2507121Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2507694Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2508168Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2508629Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2509108Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2509758Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2510451Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2510955Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2511425Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2511779Z dist init r=1, world=2 2023-01-11T22:25:41.2512037Z dist init r=0, world=2 2023-01-11T22:25:41.2512384Z ok (3.913s) 2023-01-11T22:25:41.2513348Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=False)_offload_activations_False_use_orig_params_True (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56941 2023-01-11T22:25:41.2514622Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56942 2023-01-11T22:25:41.2516081Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2516689Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2517289Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2517768Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2518331Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2518791Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2519368Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2519832Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2520268Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2520766Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2521425Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2522091Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2522615Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2523092Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2523447Z dist init r=0, world=2 2023-01-11T22:25:41.2523681Z dist init r=1, world=2 2023-01-11T22:25:41.2523922Z ok (3.813s) 2023-01-11T22:25:41.2524488Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=False)_offload_activations_True_use_orig_params_False (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57024 2023-01-11T22:25:41.2525121Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57025 2023-01-11T22:25:41.2525737Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2526189Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2526766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2527226Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2527809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2528252Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2528818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2529267Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2529717Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2530211Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2530844Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2531627Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2532154Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2532627Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2533741Z dist init r=1, world=2 2023-01-11T22:25:41.2534006Z dist init r=0, world=2 2023-01-11T22:25:41.2534359Z ok (3.813s) 2023-01-11T22:25:41.2534928Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=False)_offload_activations_True_use_orig_params_True (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57107 2023-01-11T22:25:41.2535579Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57108 2023-01-11T22:25:41.2536211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2536673Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2537232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2537699Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2538283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2538728Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2539279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2539745Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2540200Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2540680Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2541337Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2542067Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2543036Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2543949Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2544629Z dist init r=1, world=2 2023-01-11T22:25:41.2545105Z dist init r=0, world=2 2023-01-11T22:25:41.2545519Z ok (3.813s) 2023-01-11T22:25:41.2546594Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=True)_offload_activations_False_use_orig_params_False (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57190 2023-01-11T22:25:41.2547891Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57191 2023-01-11T22:25:41.2549167Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2550051Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2551227Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2552182Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2553359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2554243Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2555408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2556531Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2557425Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2558436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2559780Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2561276Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2562339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2563306Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2564018Z dist init r=1, world=2 2023-01-11T22:25:41.2564468Z dist init r=0, world=2 2023-01-11T22:25:41.2564936Z ok (3.813s) 2023-01-11T22:25:41.2566119Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=True)_offload_activations_False_use_orig_params_True (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57273 2023-01-11T22:25:41.2567471Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57274 2023-01-11T22:25:41.2568714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2569628Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2570794Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2571748Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2573389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2574332Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2575507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2576402Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2577326Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2578338Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2579680Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2581071Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2582132Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2583103Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2583830Z dist init r=0, world=2 2023-01-11T22:25:41.2584296Z dist init r=1, world=2 2023-01-11T22:25:41.2584756Z ok (3.812s) 2023-01-11T22:25:41.2585903Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=True)_offload_activations_True_use_orig_params_False (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57356 2023-01-11T22:25:41.2587231Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57357 2023-01-11T22:25:41.2588469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2589376Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2590543Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2591629Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2592819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2593725Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2594868Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2595910Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2596831Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2598171Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2599243Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2600574Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2601637Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2602610Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2603303Z dist init r=0, world=2 2023-01-11T22:25:41.2603784Z dist init r=1, world=2 2023-01-11T22:25:41.2604237Z ok (3.812s) 2023-01-11T22:25:41.2605343Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=True)_offload_activations_True_use_orig_params_True (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57439 2023-01-11T22:25:41.2606670Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57440 2023-01-11T22:25:41.2607901Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2608813Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2609968Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2610909Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2612076Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2613407Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2614585Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2615507Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2616414Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2617728Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2618827Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2620153Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2621229Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2622179Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2622891Z dist init r=1, world=2 2023-01-11T22:25:41.2623371Z dist init r=0, world=2 2023-01-11T22:25:41.2623810Z ok (3.812s) 2023-01-11T22:25:41.2624783Z test_checkpoint_submodule_use_reentrant_False (__main__.TestFSDPCheckpointSubmodule) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57522 2023-01-11T22:25:41.2625956Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57523 2023-01-11T22:25:41.2627331Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2628244Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2629421Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2630477Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2631662Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:25:41.2632568Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:25:41.2633772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:25:41.2634728Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:25:41.2635631Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:25:41.2636639Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:25:41.2637974Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2639391Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:25:41.2640441Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:25:41.2641409Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:25:41.2642125Z dist init r=0, world=2 2023-01-11T22:25:41.2642588Z dist init r=1, world=2 2023-01-11T22:25:41.2643048Z ok (3.812s) 2023-01-11T22:25:41.2643329Z 2023-01-11T22:25:41.2643849Z ---------------------------------------------------------------------- 2023-01-11T22:25:41.2644463Z Ran 17 tests in 67.233s 2023-01-11T22:25:41.2644780Z 2023-01-11T22:25:41.2644951Z OK 2023-01-11T22:25:41.2645204Z 2023-01-11T22:25:41.2645437Z Generating XML reports... 2023-01-11T22:25:41.2646668Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_checkpoint/TEST-TestFSDPCheckpoint-20230111222433.xml 2023-01-11T22:25:41.2648351Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_checkpoint/TEST-TestFSDPCheckpointSubmodule-20230111222433.xml 2023-01-11T22:25:41.2649148Z 2023-01-11T22:25:41.2649758Z ##[endgroup] 2023-01-11T22:25:41.2651014Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_checkpoint_kn57v2en) 2023-01-11T22:25:41.2651777Z 2023-01-11T22:25:41.2652349Z Running distributed/optim/test_zero_redundancy_optimizer ... [2023-01-11 22:25:41.235967] 2023-01-11T22:25:41.2654271Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/optim/test_zero_redundancy_optimizer.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:25:41.236224] 2023-01-11T22:28:21.2458554Z 2023-01-11T22:28:21.2459280Z Expand the folded group to see the log file of distributed/optim/test_zero_redundancy_optimizer 2023-01-11T22:28:21.2464165Z ##[group]PRINTING LOG FILE of distributed/optim/test_zero_redundancy_optimizer (/var/lib/jenkins/workspace/test/test-reports/distributed-optim-test_zero_redundancy_optimizer_0gb5y_75) 2023-01-11T22:28:21.2464620Z 2023-01-11T22:28:21.2464744Z Running tests... 2023-01-11T22:28:21.2465264Z ---------------------------------------------------------------------- 2023-01-11T22:28:21.2465865Z Test results will be stored in test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer 2023-01-11T22:28:21.2466371Z test_add_param_group (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2467121Z Check that ZeroRedundancyOptimizer properly handles adding a new ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:28:21.2467979Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57640 2023-01-11T22:28:21.2468845Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57641 2023-01-11T22:28:21.2469611Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2470223Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2471125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2471916Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2472854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2473369Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2474183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2474716Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2475136Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2475620Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2476364Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2476870Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2477759Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2478463Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2478869Z ok (4.237s) 2023-01-11T22:28:21.2479276Z test_collect_shards (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2480024Z Check the state consolidation mechanism and the state dict exposed ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57710 2023-01-11T22:28:21.2480576Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57711 2023-01-11T22:28:21.2481211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2481653Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2482227Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2482670Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2483257Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2483721Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2484609Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2485092Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2485541Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2486020Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2486485Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2486985Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2487650Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2488436Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2488839Z ok (4.414s) 2023-01-11T22:28:21.2489355Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_False_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2490126Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57794 2023-01-11T22:28:21.2490663Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57795 2023-01-11T22:28:21.2491283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2491735Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2492321Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2492774Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2496734Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2497185Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2497766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2498219Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2498659Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2499132Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2499603Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2500104Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2500763Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2501453Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2502342Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2503396Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2504081Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2504572Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2505032Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2505511Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2505861Z ok (4.113s) 2023-01-11T22:28:21.2506381Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_False_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2507067Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57907 2023-01-11T22:28:21.2507616Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57908 2023-01-11T22:28:21.2508243Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2508851Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2509414Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2509884Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2510541Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2510981Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2511557Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2512023Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2512466Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2512925Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2513408Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2513904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2514546Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2515232Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2516140Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2517184Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2517853Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2518321Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2518811Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2519293Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2519626Z ok (4.013s) 2023-01-11T22:28:21.2520139Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_True_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2520845Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58020 2023-01-11T22:28:21.2521399Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58021 2023-01-11T22:28:21.2521995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2522446Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2523026Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2523497Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2524058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2524506Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2525081Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2525598Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2526036Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2526511Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2527003Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2527522Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2528195Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2528883Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2529786Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2530822Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2531493Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2531983Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2532462Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2533359Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2533723Z ok (4.113s) 2023-01-11T22:28:21.2534237Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_True_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2534948Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58133 2023-01-11T22:28:21.2535474Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58134 2023-01-11T22:28:21.2536093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2536546Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2537105Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2537580Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2538163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2538613Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2539168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2539634Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2540074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2540533Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2541022Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2541519Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2542179Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2542954Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2543862Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2544975Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2545651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2546139Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2546602Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2547082Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2547438Z ok (4.013s) 2023-01-11T22:28:21.2547929Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_False_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2548637Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58246 2023-01-11T22:28:21.2549183Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58247 2023-01-11T22:28:21.2549804Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2550240Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2550817Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2551296Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2551882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2552311Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2552886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2553356Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2553778Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2554251Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2554739Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2555235Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2555879Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2556574Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2557481Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2558529Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2559179Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2559665Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2560214Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2560693Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2561028Z ok (4.132s) 2023-01-11T22:28:21.2561587Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_False_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2562298Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58359 2023-01-11T22:28:21.2562829Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58360 2023-01-11T22:28:21.2563451Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2563910Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2564487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2564941Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2565523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2565978Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2566552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2566999Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2567443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2567918Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2568392Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2568888Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2569548Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2570240Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2571129Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2572175Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2573226Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2573729Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2574195Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2574680Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2575033Z ok (4.113s) 2023-01-11T22:28:21.2575544Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_True_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2576226Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58472 2023-01-11T22:28:21.2576777Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58473 2023-01-11T22:28:21.2577490Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2577943Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2578504Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2579051Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2579650Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2580078Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2580652Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2581117Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2581563Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2582019Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2582499Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2582996Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2583657Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2584330Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2585235Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2586273Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2586940Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2587413Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2587893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2588371Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2588701Z ok (4.013s) 2023-01-11T22:28:21.2589207Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_True_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2589912Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58585 2023-01-11T22:28:21.2590458Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58586 2023-01-11T22:28:21.2591054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2591504Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2592085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2592555Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2593118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2593567Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2594213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2594680Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2595099Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2595569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2596113Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2596595Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2597258Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2597951Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2598861Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2599888Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2600559Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2601046Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2601524Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2601985Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2602333Z ok (4.113s) 2023-01-11T22:28:21.2602848Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_False_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2603557Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58698 2023-01-11T22:28:21.2604085Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58699 2023-01-11T22:28:21.2604700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2605153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2605710Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2606181Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2606760Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2607210Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2607766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2608238Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2608682Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2609154Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2609622Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2610117Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2610775Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2611516Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2612424Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2613902Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2614584Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2615072Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2615535Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2616016Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2616370Z ok (4.013s) 2023-01-11T22:28:21.2616861Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_False_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2617573Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58811 2023-01-11T22:28:21.2618122Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58812 2023-01-11T22:28:21.2618744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2619179Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2619754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2620228Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2620806Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2621231Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2621808Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2622273Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2622693Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2623165Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2623656Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2624157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2624797Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2625483Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2626393Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2627440Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2628086Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2628656Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2629135Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2629620Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2629951Z ok (4.113s) 2023-01-11T22:28:21.2630513Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_True_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2631227Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58924 2023-01-11T22:28:21.2631772Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58925 2023-01-11T22:28:21.2632377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2632834Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2633412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2633868Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2634452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2634896Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2635470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2635921Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2636364Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2636844Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2637317Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2637811Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2638475Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2639167Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2640060Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2641109Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2641785Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2642274Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2642756Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2643220Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2643576Z ok (4.014s) 2023-01-11T22:28:21.2644084Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_True_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2644772Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59037 2023-01-11T22:28:21.2645384Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59038 2023-01-11T22:28:21.2646001Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2646455Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2647061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2647540Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2648130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2648558Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2649126Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2649597Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2650038Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2650490Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2650977Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2651473Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2652134Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2652802Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2654605Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2655666Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2656336Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2656807Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2657290Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2657766Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2658118Z ok (4.113s) 2023-01-11T22:28:21.2658610Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_False_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2659324Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59150 2023-01-11T22:28:21.2659874Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59151 2023-01-11T22:28:21.2660474Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2660930Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2661511Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2661985Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2662548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2663100Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2663677Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2664148Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2664563Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2665102Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2665602Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2666078Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2666740Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2667429Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2668343Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2669376Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2670041Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2670530Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2671008Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2671470Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2671827Z ok (4.113s) 2023-01-11T22:28:21.2672341Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_False_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2673050Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59263 2023-01-11T22:28:21.2673606Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59264 2023-01-11T22:28:21.2674221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2674676Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2675256Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2675711Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2676297Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2676743Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2677300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2677772Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2678214Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2678692Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2679161Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2679654Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2680383Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2681070Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2682009Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2683066Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2683732Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2684225Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2684690Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2685169Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2685520Z ok (4.013s) 2023-01-11T22:28:21.2686010Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_True_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2686716Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59376 2023-01-11T22:28:21.2687263Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59377 2023-01-11T22:28:21.2687879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2688312Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2688890Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2689359Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2689939Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2690365Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2690942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2691410Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2691831Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2692304Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2692787Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2693650Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2694294Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2694980Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2695887Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2696930Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2697689Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2698157Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2698637Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2699122Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2699525Z ok (4.013s) 2023-01-11T22:28:21.2700042Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_True_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2700744Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59489 2023-01-11T22:28:21.2701292Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59490 2023-01-11T22:28:21.2701901Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2702353Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2702933Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2703386Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2703968Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2704418Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2704990Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2705436Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2705877Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2706356Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2706848Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2707323Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2707987Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2708681Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2709587Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2710623Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2023-01-11T22:28:21.2711295Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2711783Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2712267Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2712726Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2713080Z ok (4.013s) 2023-01-11T22:28:21.2713518Z test_local_optimizer_parity_optimizer_class_str_AdamW_maximize_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2714127Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59602 2023-01-11T22:28:21.2714732Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59603 2023-01-11T22:28:21.2715348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2715802Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2716413Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2716890Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2717473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2717902Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2718476Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2718948Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2719390Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2719845Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2720327Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2720829Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2721486Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2722159Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2723231Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:28:21.2724779Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:28:21.2725900Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2726651Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2727396Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2728111Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2728843Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2729650Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2730387Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2731174Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2731898Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2732624Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2733660Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2734398Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2735136Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2735848Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2736579Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2737307Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2737778Z ok (4.415s) 2023-01-11T22:28:21.2738190Z test_local_optimizer_parity_optimizer_class_str_AdamW_maximize_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2738816Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59685 2023-01-11T22:28:21.2739356Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59686 2023-01-11T22:28:21.2739983Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2740416Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2740992Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2741466Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2742027Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2742470Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2743044Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2743511Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2743932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2744400Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2744888Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2745361Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2746118Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2746806Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2747938Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:28:21.2749437Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:28:21.2750566Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2751311Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2752035Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2752776Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2753515Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2754252Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2754985Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2755694Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2756429Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2757160Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2757891Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2758600Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2759324Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2760118Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2760847Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2761653Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2762113Z ok (4.416s) 2023-01-11T22:28:21.2762585Z test_local_optimizer_parity_optimizer_class_str_Adam_maximize_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2763209Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59768 2023-01-11T22:28:21.2763733Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59769 2023-01-11T22:28:21.2764361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2764814Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2765390Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2765847Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2766430Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2766878Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2767446Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2767892Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2768332Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2768805Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2769276Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2769776Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2770435Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2771125Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2772174Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:28:21.2774072Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:28:21.2775183Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2776026Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2776822Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2777569Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2778305Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2779018Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2779751Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2780491Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2781220Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2781948Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2782660Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2783391Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2784133Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2784860Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2785570Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2786296Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2786772Z ok (4.415s) 2023-01-11T22:28:21.2787199Z test_local_optimizer_parity_optimizer_class_str_Adam_maximize_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2787804Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59851 2023-01-11T22:28:21.2788349Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59852 2023-01-11T22:28:21.2788979Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2789433Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2789990Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2790457Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2791112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2791539Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2792114Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2792631Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2793081Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2793537Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2794019Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2794514Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2795182Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2795852Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2796926Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:28:21.2798432Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:28:21.2799560Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2800308Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2801049Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2801770Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2802502Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2803238Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2803978Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2804710Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2805483Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2806207Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2806994Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2807733Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2808463Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2809175Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2809905Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2810640Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2811106Z ok (4.420s) 2023-01-11T22:28:21.2811516Z test_local_optimizer_parity_optimizer_class_str_SGD_maximize_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2812135Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59934 2023-01-11T22:28:21.2812674Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59935 2023-01-11T22:28:21.2813525Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2813962Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2814541Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2815016Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2815579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2816023Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2816597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2817059Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2817485Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2817958Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2818446Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2818922Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2819585Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2820293Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2821371Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:28:21.2823009Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:28:21.2824112Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2824867Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2825578Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2826330Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2827059Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2827793Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2828521Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2829252Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2829988Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2830728Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2831466Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2832194Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2832916Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2833668Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2834393Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2835114Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2835639Z ok (4.437s) 2023-01-11T22:28:21.2836047Z test_local_optimizer_parity_optimizer_class_str_SGD_maximize_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2836667Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60017 2023-01-11T22:28:21.2837245Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60018 2023-01-11T22:28:21.2837872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2838300Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2838876Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2839341Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2839904Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2840349Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2840921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2841382Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2841804Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2842292Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2842779Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2843257Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2843907Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2844614Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2847447Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:28:21.2849051Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:28:21.2850214Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2850980Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2851732Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2852464Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2853564Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2854433Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2855224Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2855954Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2856693Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2857416Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2858155Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2858891Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2859628Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2860357Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2861075Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2861816Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2023-01-11T22:28:21.2862291Z ok (4.416s) 2023-01-11T22:28:21.2862670Z test_lr_scheduler (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2863218Z Check that a normal PyTorch ``lr_scheduler`` is usable with ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60100 2023-01-11T22:28:21.2863753Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60101 2023-01-11T22:28:21.2864426Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2864888Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2865444Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2865935Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2866538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2867127Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2867714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2868172Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2868618Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2869187Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2869656Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2870155Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2870876Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2871585Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2871961Z ok (4.013s) 2023-01-11T22:28:21.2872338Z test_multiple_param_groups (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2872953Z Check parity between constructing ZeRO with multiple parameter groups ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60184 2023-01-11T22:28:21.2873510Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60185 2023-01-11T22:28:21.2874099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2874551Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2875129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2875583Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2876165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2876613Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2877187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2877635Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2878078Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2878550Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2879023Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2879520Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2880181Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2880875Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2881251Z ok (4.515s) 2023-01-11T22:28:21.2881631Z test_nondefault_process_group (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2882371Z Check that ZeroRedundancyOptimizer works with a non-default process ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60268 2023-01-11T22:28:21.2882928Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60269 2023-01-11T22:28:21.2883519Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2883975Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2884556Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2885011Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2885591Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2886039Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2886688Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2887137Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2887576Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2888051Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2888627Z INFO:torch.testing._internal.common_distributed:Skipping `test_nondefault_process_group()` since world size of 2 is less than 4 2023-01-11T22:28:21.2889172Z INFO:torch.testing._internal.common_distributed:Skipping `test_nondefault_process_group()` since world size of 2 is less than 4 2023-01-11T22:28:21.2889556Z ok (2.511s) 2023-01-11T22:28:21.2889910Z test_sharding (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2890603Z Check ZeroRedundancyOptimizer's parameter sharding at construction ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60336 2023-01-11T22:28:21.2891163Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60337 2023-01-11T22:28:21.2891773Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2892225Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2892786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2893469Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2894060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2894486Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2895060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2895530Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2895973Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2896424Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2896904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2897404Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2898060Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2898731Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2899128Z ok (2.509s) 2023-01-11T22:28:21.2899477Z test_step (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2900025Z Check that ZeroRedundancyOptimizer properly exposes the ``step()`` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60406 2023-01-11T22:28:21.2900574Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60407 2023-01-11T22:28:21.2901186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2901642Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2902187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2902638Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2903214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2903666Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2904356Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2904831Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2905276Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2905811Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2906308Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2906804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2907475Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2908143Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2908545Z ok (3.913s) 2023-01-11T22:28:21.2908910Z test_step_with_closure (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2909463Z Check that ZeroRedundancyOptimizer properly exposes the ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60489 2023-01-11T22:28:21.2910004Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60490 2023-01-11T22:28:21.2910620Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2911073Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2911628Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2912099Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2912676Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2913105Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2913676Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2914141Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2914582Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2915039Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2915525Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2916024Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2916683Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2917357Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2917755Z ok (3.914s) 2023-01-11T22:28:21.2918116Z test_zero_join_cpu (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2918658Z Check that the ZeRO join hook allows training with uneven inputs ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60572 2023-01-11T22:28:21.2919192Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60573 2023-01-11T22:28:21.2919806Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2920258Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2920813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2921352Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2921938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2922389Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2923001Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2923486Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2923927Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2924383Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2924873Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2925368Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2926034Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2926700Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2927228Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2927715Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2928366Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T22:28:21.2928807Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T22:28:21.2929397Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T22:28:21.2929847Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T22:28:21.2930116Z ok (2.609s) 2023-01-11T22:28:21.2930476Z test_zero_join_gpu (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2931034Z Check that the ZeRO join hook allows training with uneven inputs ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60650 2023-01-11T22:28:21.2931566Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60651 2023-01-11T22:28:21.2932158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2932611Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2933468Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2933922Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2934507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2934960Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2935534Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2935981Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2936429Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2936902Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2937375Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:28:21.2937866Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2938524Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2939316Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:28:21.2939825Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2940310Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:28:21.2941031Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T22:28:21.2941504Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T22:28:21.2942073Z /opt/conda/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up 2023-01-11T22:28:21.2942514Z _warnings.warn(warn_message, ResourceWarning) 2023-01-11T22:28:21.2942799Z ok (4.514s) 2023-01-11T22:28:21.2943198Z test_zero_model_parallel_parameters_as_bucket_view_False (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2943936Z Check that ZeRO works with model parallelism where the model's ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60734 2023-01-11T22:28:21.2944469Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60735 2023-01-11T22:28:21.2945075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2945509Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2946085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2946552Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2947132Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2947553Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2948129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2948589Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2949008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2949480Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2949872Z skip: Need at least 4 CUDA devices (2.510s) 2023-01-11T22:28:21.2950328Z test_zero_model_parallel_parameters_as_bucket_view_True (__main__.TestZeroRedundancyOptimizerDistributed) 2023-01-11T22:28:21.2951042Z Check that ZeRO works with model parallelism where the model's ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60802 2023-01-11T22:28:21.2951575Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60803 2023-01-11T22:28:21.2952185Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2952633Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2953186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2953651Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2954234Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2954662Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2955232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2955695Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2956198Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2956651Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:28:21.2957038Z skip: Need at least 4 CUDA devices (2.510s) 2023-01-11T22:28:21.2957434Z test_constructor (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:28:21.2958038Z Check the robustness of the ZeroRedundancyOptimizer constructor by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60870 2023-01-11T22:28:21.2958757Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2959204Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2959776Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2960227Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2960665Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2961150Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2961810Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:28:21.2962188Z ok (2.409s) 2023-01-11T22:28:21.2962544Z test_lr_scheduler (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:28:21.2963091Z Check that a normal PyTorch ``lr_scheduler`` is usable with ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60905 2023-01-11T22:28:21.2963749Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2964199Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2964773Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2965248Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2965666Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2966157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2966821Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:28:21.2967204Z ok (3.310s) 2023-01-11T22:28:21.2967573Z test_same_dense_param_type (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:28:21.2968157Z Check that ZeroRedundancyOptimizer raises an exception if the input ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60947 2023-01-11T22:28:21.2968861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2969296Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2969870Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2970338Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2970761Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2971251Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2971912Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:28:21.2972308Z ok (2.408s) 2023-01-11T22:28:21.2972642Z test_state_dict (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:28:21.2973427Z Check that ZeroRedundancyOptimizer exposes the expected state dict ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60982 2023-01-11T22:28:21.2974244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2974695Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2975252Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2975780Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2976236Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2976704Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2977369Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:28:21.2977768Z ok (3.211s) 2023-01-11T22:28:21.2978146Z test_step_with_extra_inner_key (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:28:21.2978711Z Check that ZeroRedundancyOptimizer wrapping an optimizer that adds ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61024 2023-01-11T22:28:21.2979416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2979867Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2980446Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2980894Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2981329Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2981818Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2982460Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:28:21.2982856Z ok (3.310s) 2023-01-11T22:28:21.2983216Z test_step_with_kwargs (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:28:21.2983772Z Check that the ``step(**kwargs)`` interface is properly exposed. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61066 2023-01-11T22:28:21.2984441Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2984892Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2985466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2985915Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2986352Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2986847Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2987506Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:28:21.2987883Z ok (3.310s) 2023-01-11T22:28:21.2988246Z test_step_without_closure (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:28:21.2988806Z Check that the ``step()`` method (without closure) is handled as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61108 2023-01-11T22:28:21.2989489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2989915Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2990490Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2991026Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2991445Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2991932Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2992648Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:28:21.2993056Z ok (3.310s) 2023-01-11T22:28:21.2993384Z test_zero_grad (__main__.TestZeroRedundancyOptimizerSingleRank) 2023-01-11T22:28:21.2993916Z Check that the ``zero_grad`` method is properly handled. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61150 2023-01-11T22:28:21.2994596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:28:21.2995035Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:28:21.2995610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:28:21.2996078Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:28:21.2996514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:28:21.2996985Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:28:21.2997639Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:28:21.2998036Z ok (2.508s) 2023-01-11T22:28:21.2998187Z 2023-01-11T22:28:21.2998442Z ---------------------------------------------------------------------- 2023-01-11T22:28:21.2998780Z Ran 42 tests in 157.498s 2023-01-11T22:28:21.2998946Z 2023-01-11T22:28:21.2999056Z OK (skipped=2) 2023-01-11T22:28:21.2999215Z 2023-01-11T22:28:21.2999343Z Generating XML reports... 2023-01-11T22:28:21.3000045Z Generated XML report: test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerDistributed-20230111222543.xml 2023-01-11T22:28:21.3001021Z Generated XML report: test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerSingleRank-20230111222543.xml 2023-01-11T22:28:21.3001465Z 2023-01-11T22:28:21.3001813Z ##[endgroup] 2023-01-11T22:28:21.3002463Z FINISHED PRINTING LOG FILE of distributed/optim/test_zero_redundancy_optimizer (/var/lib/jenkins/workspace/test/test-reports/distributed-optim-test_zero_redundancy_optimizer_0gb5y_75) 2023-01-11T22:28:21.3002858Z 2023-01-11T22:28:21.3003155Z Running distributed/fsdp/test_fsdp_summon_full_params ... [2023-01-11 22:28:21.246653] 2023-01-11T22:28:21.3003872Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_summon_full_params.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:28:21.246999] 2023-01-11T22:31:25.2085290Z 2023-01-11T22:31:25.2086001Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_summon_full_params 2023-01-11T22:31:25.2089191Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_summon_full_params (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_summon_full_params_xc9_rtc3) 2023-01-11T22:31:25.2092967Z 2023-01-11T22:31:25.2093484Z Running tests... 2023-01-11T22:31:25.2094309Z ---------------------------------------------------------------------- 2023-01-11T22:31:25.2095350Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params 2023-01-11T22:31:25.2096340Z test_cannot_summon_full_params_from_backward (__main__.TestSummonFullParams) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:31:25.2097253Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61222 2023-01-11T22:31:25.2098146Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61223 2023-01-11T22:31:25.2098906Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2099365Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2100167Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2101048Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2101992Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2102818Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2103869Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2104666Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2105150Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2105654Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2106306Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2107001Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2107531Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2108008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2109266Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2110069Z warnings.warn( 2023-01-11T22:31:25.2111223Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2111980Z warnings.warn( 2023-01-11T22:31:25.2112235Z dist init r=0, world=2 2023-01-11T22:31:25.2112471Z dist init r=1, world=2 2023-01-11T22:31:25.2112719Z ok (5.442s) 2023-01-11T22:31:25.2113182Z test_cannot_summon_full_params_from_forward (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61305 2023-01-11T22:31:25.2113741Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61306 2023-01-11T22:31:25.2114397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2114843Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2115426Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2115902Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2116463Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2116914Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2117599Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2118074Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2118515Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2119072Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2119756Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2120456Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2120960Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2121443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2122711Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2123490Z warnings.warn( 2023-01-11T22:31:25.2124638Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2125422Z warnings.warn( 2023-01-11T22:31:25.2125661Z dist init r=1, world=2 2023-01-11T22:31:25.2125985Z dist init r=0, world=2 2023-01-11T22:31:25.2126232Z ok (3.311s) 2023-01-11T22:31:25.2126576Z test_named_parameters_buffers_prefix__recurse_False (__main__.TestSummonFullParams) 2023-01-11T22:31:25.2127136Z Tests that ``named_parameters()`` and ``named_buffers()`` for a ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61384 2023-01-11T22:31:25.2127669Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61385 2023-01-11T22:31:25.2128269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2128731Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2129313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2129793Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2130358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2130815Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2131392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2131868Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2132309Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2132812Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2133950Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2134764Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2135300Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2135784Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2136154Z dist init r=1, world=2 2023-01-11T22:31:25.2136397Z dist init r=0, world=2 2023-01-11T22:31:25.2136726Z ok (3.311s) 2023-01-11T22:31:25.2137103Z test_named_parameters_buffers_prefix__recurse_True (__main__.TestSummonFullParams) 2023-01-11T22:31:25.2137640Z Tests that ``named_parameters()`` and ``named_buffers()`` for a ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61463 2023-01-11T22:31:25.2138173Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61464 2023-01-11T22:31:25.2138802Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2139265Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2139831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2140317Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2140902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2141354Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2141907Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2142377Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2142835Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2143316Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2143977Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2144667Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2145199Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2145658Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2146009Z dist init r=1, world=2 2023-01-11T22:31:25.2146269Z dist init r=0, world=2 2023-01-11T22:31:25.2146495Z ok (3.311s) 2023-01-11T22:31:25.2146874Z test_named_parameters_buffers_prefix_test_prefix_recurse_False (__main__.TestSummonFullParams) 2023-01-11T22:31:25.2147433Z Tests that ``named_parameters()`` and ``named_buffers()`` for a ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61542 2023-01-11T22:31:25.2147963Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61543 2023-01-11T22:31:25.2148553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2149576Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2150167Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2150640Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2151197Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2151641Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2152209Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2152737Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2153189Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2153683Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2154400Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2155084Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2155611Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2156082Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2156440Z dist init r=0, world=2 2023-01-11T22:31:25.2156681Z dist init r=1, world=2 2023-01-11T22:31:25.2156922Z ok (3.311s) 2023-01-11T22:31:25.2157290Z test_named_parameters_buffers_prefix_test_prefix_recurse_True (__main__.TestSummonFullParams) 2023-01-11T22:31:25.2157829Z Tests that ``named_parameters()`` and ``named_buffers()`` for a ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61621 2023-01-11T22:31:25.2158350Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61622 2023-01-11T22:31:25.2158959Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2159392Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2159963Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2160429Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2161004Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2161431Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2162001Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2162460Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2162918Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2163392Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2164044Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2164726Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2165233Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2165699Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2166046Z dist init r=1, world=2 2023-01-11T22:31:25.2166296Z dist init r=0, world=2 2023-01-11T22:31:25.2166521Z ok (3.311s) 2023-01-11T22:31:25.2167040Z test_params_are_unflattenned_rank0_only_False_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61700 2023-01-11T22:31:25.2167642Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61701 2023-01-11T22:31:25.2168233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2168679Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2169253Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2169794Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2170358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2170802Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2171428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2171902Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2172335Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2172826Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2173961Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2174636Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2175152Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2175622Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2175975Z dist init r=0, world=2 2023-01-11T22:31:25.2176214Z dist init r=1, world=2 2023-01-11T22:31:25.2176460Z ok (3.211s) 2023-01-11T22:31:25.2176976Z test_params_are_unflattenned_rank0_only_False_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61779 2023-01-11T22:31:25.2177560Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61780 2023-01-11T22:31:25.2178169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2178624Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2179199Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2179647Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2180224Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2180668Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2181218Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2181683Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2182132Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2182662Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2183293Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2183977Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2184499Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2184967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2185304Z dist init r=0, world=2 2023-01-11T22:31:25.2185555Z dist init r=1, world=2 2023-01-11T22:31:25.2185797Z ok (3.311s) 2023-01-11T22:31:25.2186299Z test_params_are_unflattenned_rank0_only_False_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61858 2023-01-11T22:31:25.2187006Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61859 2023-01-11T22:31:25.2187620Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2188071Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2188698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2189177Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2189763Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2190191Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2190760Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2191227Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2191678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2192151Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2192802Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2193521Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2194036Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2194486Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2195568Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:792: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2196243Z warnings.warn( 2023-01-11T22:31:25.2197206Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_unshard_param_utils.py:147: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2197845Z warnings.warn( 2023-01-11T22:31:25.2198812Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:792: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2199466Z warnings.warn( 2023-01-11T22:31:25.2200410Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_unshard_param_utils.py:147: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2201045Z warnings.warn( 2023-01-11T22:31:25.2201273Z dist init r=1, world=2 2023-01-11T22:31:25.2201523Z dist init r=0, world=2 2023-01-11T22:31:25.2201764Z ok (3.311s) 2023-01-11T22:31:25.2202261Z test_params_are_unflattenned_rank0_only_False_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61937 2023-01-11T22:31:25.2202860Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61938 2023-01-11T22:31:25.2203553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2204004Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2204559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2205089Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2205679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2206104Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2206677Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2207138Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2207647Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2208119Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2208775Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2209464Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2209982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2210431Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2211507Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:792: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2212180Z warnings.warn( 2023-01-11T22:31:25.2213619Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:792: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2214289Z warnings.warn( 2023-01-11T22:31:25.2215232Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_unshard_param_utils.py:147: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2215888Z warnings.warn( 2023-01-11T22:31:25.2216840Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_unshard_param_utils.py:147: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2217485Z warnings.warn( 2023-01-11T22:31:25.2217722Z dist init r=1, world=2 2023-01-11T22:31:25.2217972Z dist init r=0, world=2 2023-01-11T22:31:25.2218209Z ok (3.311s) 2023-01-11T22:31:25.2218706Z test_params_are_unflattenned_rank0_only_True_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62016 2023-01-11T22:31:25.2219303Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62017 2023-01-11T22:31:25.2219906Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2220468Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2221033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2221504Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2222183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2222643Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2223205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2223675Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2224133Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2224617Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2225275Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2225961Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2226486Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2226938Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2227287Z dist init r=0, world=2 2023-01-11T22:31:25.2227540Z dist init r=1, world=2 2023-01-11T22:31:25.2227761Z ok (3.211s) 2023-01-11T22:31:25.2228276Z test_params_are_unflattenned_rank0_only_True_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62095 2023-01-11T22:31:25.2228879Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62096 2023-01-11T22:31:25.2229490Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2229921Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2230500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2230970Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2231530Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2231972Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2232542Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2233010Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2233443Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2233935Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2234589Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2235272Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2235774Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2236243Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2236596Z dist init r=0, world=2 2023-01-11T22:31:25.2236831Z dist init r=1, world=2 2023-01-11T22:31:25.2237148Z ok (3.312s) 2023-01-11T22:31:25.2237665Z test_params_are_unflattenned_rank0_only_True_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62174 2023-01-11T22:31:25.2238269Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62175 2023-01-11T22:31:25.2238919Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2239378Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2239957Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2240412Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2240990Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2241550Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2242119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2242565Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2243011Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2243505Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2244156Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2244825Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2245345Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2245824Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2246159Z dist init r=0, world=2 2023-01-11T22:31:25.2246409Z dist init r=1, world=2 2023-01-11T22:31:25.2246647Z ok (3.311s) 2023-01-11T22:31:25.2247165Z test_params_are_unflattenned_rank0_only_True_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62253 2023-01-11T22:31:25.2247747Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62254 2023-01-11T22:31:25.2248355Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2248803Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2249362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2249833Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2250413Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2250851Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2251402Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2251869Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2252317Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2252806Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2253669Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2254357Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2254980Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2255433Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2255790Z dist init r=0, world=2 2023-01-11T22:31:25.2256045Z dist init r=1, world=2 2023-01-11T22:31:25.2256270Z ok (3.311s) 2023-01-11T22:31:25.2256844Z test_params_count_and_value_rank0_only_False_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62332 2023-01-11T22:31:25.2257461Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62333 2023-01-11T22:31:25.2258074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2258510Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2259085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2259550Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2260126Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2260555Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2261125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2261587Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2262025Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2262516Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2263173Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2263857Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2264360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2264834Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2265188Z dist init r=1, world=2 2023-01-11T22:31:25.2265443Z dist init r=0, world=2 2023-01-11T22:31:25.2265665Z ok (3.311s) 2023-01-11T22:31:25.2266174Z test_params_count_and_value_rank0_only_False_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62411 2023-01-11T22:31:25.2266768Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62412 2023-01-11T22:31:25.2267366Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2267813Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2268392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2268865Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2269427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2269870Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2270441Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2270886Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2271413Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2271904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2272562Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2273284Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2273816Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2274283Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2274639Z dist init r=1, world=2 2023-01-11T22:31:25.2274875Z dist init r=0, world=2 2023-01-11T22:31:25.2275119Z ok (3.311s) 2023-01-11T22:31:25.2275627Z test_params_count_and_value_rank0_only_False_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62490 2023-01-11T22:31:25.2276212Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62491 2023-01-11T22:31:25.2276827Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2277277Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2277858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2278311Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2278886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2279330Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2279884Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2280346Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2280795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2281288Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2281926Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2282611Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2283161Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2283633Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2283976Z dist init r=0, world=2 2023-01-11T22:31:25.2284230Z dist init r=1, world=2 2023-01-11T22:31:25.2284470Z ok (3.311s) 2023-01-11T22:31:25.2284960Z test_params_count_and_value_rank0_only_False_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62569 2023-01-11T22:31:25.2285556Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62570 2023-01-11T22:31:25.2286165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2286613Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2287168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2287632Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2288203Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2288710Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2289286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2289748Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2290256Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2290741Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2291404Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2292091Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2292615Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2293444Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2293802Z dist init r=1, world=2 2023-01-11T22:31:25.2294055Z dist init r=0, world=2 2023-01-11T22:31:25.2294279Z ok (3.312s) 2023-01-11T22:31:25.2294794Z test_params_count_and_value_rank0_only_True_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62648 2023-01-11T22:31:25.2295392Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62649 2023-01-11T22:31:25.2295987Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2296438Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2297013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2297484Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2298047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2298490Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2299062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2299526Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2299958Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2300453Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2301101Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2301775Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2302292Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2302761Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2303118Z dist init r=1, world=2 2023-01-11T22:31:25.2303358Z dist init r=0, world=2 2023-01-11T22:31:25.2303598Z ok (3.312s) 2023-01-11T22:31:25.2304106Z test_params_count_and_value_rank0_only_True_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62727 2023-01-11T22:31:25.2304681Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62728 2023-01-11T22:31:25.2305287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2305839Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2306423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2306874Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2307522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2307978Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2308553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2308996Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2309450Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2309948Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2310584Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2311264Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2311785Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2312260Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2312596Z dist init r=1, world=2 2023-01-11T22:31:25.2312845Z dist init r=0, world=2 2023-01-11T22:31:25.2313082Z ok (3.311s) 2023-01-11T22:31:25.2313571Z test_params_count_and_value_rank0_only_True_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62806 2023-01-11T22:31:25.2314167Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62807 2023-01-11T22:31:25.2314776Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2315226Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2315785Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2316250Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2316823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2317250Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2317821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2318287Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2318738Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2319214Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2319865Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2320548Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2321070Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2321523Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2321867Z dist init r=0, world=2 2023-01-11T22:31:25.2322119Z dist init r=1, world=2 2023-01-11T22:31:25.2322341Z ok (3.311s) 2023-01-11T22:31:25.2322935Z test_params_count_and_value_rank0_only_True_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62885 2023-01-11T22:31:25.2323527Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62886 2023-01-11T22:31:25.2324138Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2324640Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2325228Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2325697Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2326260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2326701Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2327275Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2327738Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2328170Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2328662Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2329311Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2329992Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2330496Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2330968Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2331325Z dist init r=0, world=2 2023-01-11T22:31:25.2331557Z dist init r=1, world=2 2023-01-11T22:31:25.2331795Z ok (3.211s) 2023-01-11T22:31:25.2332115Z test_raises_rank0_with_writeback (__main__.TestSummonFullParams) 2023-01-11T22:31:25.2332610Z Tests that ``summon_full_params()`` with both ``rank0_only=True`` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62964 2023-01-11T22:31:25.2333475Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62965 2023-01-11T22:31:25.2334092Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2334544Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2335099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2335569Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2336146Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2336584Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2337133Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2337603Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2338050Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2338524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2339178Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2339858Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2340476Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2340927Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2341277Z dist init r=0, world=2 2023-01-11T22:31:25.2341529Z dist init r=1, world=2 2023-01-11T22:31:25.2341753Z ok (3.310s) 2023-01-11T22:31:25.2342356Z test_reshard_outside_forward_backward_iteration_rank0_only_False_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63043 2023-01-11T22:31:25.2343007Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63044 2023-01-11T22:31:25.2343620Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2344057Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2344622Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2345063Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2345630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2346083Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2346671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2347140Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2347574Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2348061Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2348716Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2349492Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2349992Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2350464Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2351724Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2352502Z warnings.warn( 2023-01-11T22:31:25.2353658Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2354417Z warnings.warn( 2023-01-11T22:31:25.2355142Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2355987Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2356833Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2357605Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2358377Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2359146Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2359935Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2360692Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2361481Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2362243Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2363022Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2363783Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2364567Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2365319Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2366099Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2366844Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2367625Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2368373Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2369227Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2370031Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2370828Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2371566Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2372355Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2373456Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2374250Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2375012Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2375782Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2376541Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2377323Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2378069Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2378858Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2379617Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2380390Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2381143Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2381924Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2382765Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2383643Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2384394Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2385179Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2385936Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2386260Z dist init r=1, world=2 2023-01-11T22:31:25.2386501Z dist init r=0, world=2 2023-01-11T22:31:25.2386739Z ok (3.813s) 2023-01-11T22:31:25.2387277Z test_reshard_outside_forward_backward_iteration_rank0_only_False_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63126 2023-01-11T22:31:25.2387905Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63127 2023-01-11T22:31:25.2388516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2388972Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2389548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2390021Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2390579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2391023Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2391598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2392042Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2392494Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2393025Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2393690Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2394362Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2394888Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2395360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2396627Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2397474Z warnings.warn( 2023-01-11T22:31:25.2398679Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2399463Z warnings.warn( 2023-01-11T22:31:25.2400187Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2400963Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2401761Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2402509Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2403299Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2404055Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2404850Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2405603Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2406367Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2407132Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2407919Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2408667Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2409564Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2410384Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2411155Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2411956Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2412752Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2413648Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2414439Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2415185Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2415973Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2416727Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2417518Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2418276Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2419061Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2419800Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2420591Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2421345Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2422127Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2422881Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2423742Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2424572Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2425371Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2426119Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2426912Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2427664Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2428431Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2429188Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2429971Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2430722Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2431050Z dist init r=1, world=2 2023-01-11T22:31:25.2431285Z dist init r=0, world=2 2023-01-11T22:31:25.2431524Z ok (3.813s) 2023-01-11T22:31:25.2432065Z test_reshard_outside_forward_backward_iteration_rank0_only_False_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63209 2023-01-11T22:31:25.2432674Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63210 2023-01-11T22:31:25.2433305Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2433763Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2434340Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2434794Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2435370Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2435812Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2436365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2436833Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2437282Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2437857Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2438501Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2439188Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2439762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2440243Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2441494Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2442283Z warnings.warn( 2023-01-11T22:31:25.2443445Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2444210Z warnings.warn( 2023-01-11T22:31:25.2444936Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2445691Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2446488Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2447242Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2448034Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2448799Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2449594Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2450336Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2451118Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2451947Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2452789Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2453691Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2454464Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2455234Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2456025Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2456781Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2457562Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2458324Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2459093Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2459853Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2460634Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2461387Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2462462Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:792: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2463113Z warnings.warn( 2023-01-11T22:31:25.2463881Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2464636Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2465824Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_unshard_param_utils.py:147: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2466469Z warnings.warn( 2023-01-11T22:31:25.2467490Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:792: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2468159Z warnings.warn( 2023-01-11T22:31:25.2469116Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_unshard_param_utils.py:147: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2469767Z warnings.warn( 2023-01-11T22:31:25.2470483Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2471231Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2472020Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2472775Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2473565Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2474318Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2475087Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2475840Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2476632Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2477381Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2478163Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2478997Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2479812Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2480604Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2481390Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2482140Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2482466Z dist init r=0, world=2 2023-01-11T22:31:25.2482701Z dist init r=1, world=2 2023-01-11T22:31:25.2482943Z ok (3.813s) 2023-01-11T22:31:25.2483482Z test_reshard_outside_forward_backward_iteration_rank0_only_False_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63292 2023-01-11T22:31:25.2484115Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63293 2023-01-11T22:31:25.2484742Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2485199Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2485778Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2486235Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2486818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2487260Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2487815Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2488285Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2488738Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2489229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2489863Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2490555Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2491075Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2491545Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2492795Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2493772Z warnings.warn( 2023-01-11T22:31:25.2495041Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2495881Z warnings.warn( 2023-01-11T22:31:25.2496616Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2497391Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2498175Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2498940Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2499732Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2500479Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2501275Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2502119Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2502905Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2503655Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2504437Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2505197Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2505983Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2506725Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2507505Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2508326Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2509156Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2509924Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2510694Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2511457Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2512530Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:792: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2513199Z warnings.warn( 2023-01-11T22:31:25.2513915Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2514663Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2515455Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2516209Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2517250Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_unshard_param_utils.py:147: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2517914Z warnings.warn( 2023-01-11T22:31:25.2518874Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:792: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2519538Z warnings.warn( 2023-01-11T22:31:25.2520493Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_unshard_param_utils.py:147: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2521209Z warnings.warn( 2023-01-11T22:31:25.2521927Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2522752Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2523528Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2524292Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2525089Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2525842Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2526632Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2527369Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2528159Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2528913Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2529757Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2530514Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2531283Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2532034Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2532826Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2533715Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2534132Z dist init r=1, world=2 2023-01-11T22:31:25.2534366Z dist init r=0, world=2 2023-01-11T22:31:25.2534611Z ok (3.813s) 2023-01-11T22:31:25.2535155Z test_reshard_outside_forward_backward_iteration_rank0_only_True_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63375 2023-01-11T22:31:25.2535768Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63376 2023-01-11T22:31:25.2536467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2536933Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2537518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2537974Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2538551Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2539001Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2539576Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2540021Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2540479Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2540975Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2541614Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2542303Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2542831Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2543301Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2544551Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2545320Z warnings.warn( 2023-01-11T22:31:25.2546475Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2547246Z warnings.warn( 2023-01-11T22:31:25.2547971Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2548746Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2549516Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2550356Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2551192Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2551953Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2552747Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2553507Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2554284Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2555038Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2555824Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2556571Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2557360Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2558106Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2558892Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2559650Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2560433Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2561193Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2561964Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2562784Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2563619Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2564377Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2565163Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2565921Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2566689Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2567455Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2568239Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2568992Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2569776Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2570537Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2571301Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2572065Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2572841Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2573801Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2574592Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2575325Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2576193Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2577001Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2577798Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2578550Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2578863Z dist init r=0, world=2 2023-01-11T22:31:25.2579117Z dist init r=1, world=2 2023-01-11T22:31:25.2579355Z ok (3.813s) 2023-01-11T22:31:25.2579876Z test_reshard_outside_forward_backward_iteration_rank0_only_True_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63458 2023-01-11T22:31:25.2580499Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63459 2023-01-11T22:31:25.2581127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2581581Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2582139Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2582608Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2583192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2583637Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2584191Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2584685Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2585148Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2585627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2586284Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2586969Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2587494Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2587948Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2589205Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2589974Z warnings.warn( 2023-01-11T22:31:25.2591129Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2591978Z warnings.warn( 2023-01-11T22:31:25.2592729Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2593506Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2594299Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2595059Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2595849Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2596613Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2597385Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2598150Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2598942Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2599693Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2600475Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2601234Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2602001Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2602755Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2603532Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2604343Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2605174Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2605922Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2606709Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2607468Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2608260Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2609017Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2609790Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2610551Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2611335Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2612087Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2613019Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2613789Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2614561Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2615313Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2616101Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2616936Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2617797Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2618551Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2619337Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2620093Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2620877Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2621631Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2622408Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2623149Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2623470Z dist init r=1, world=2 2023-01-11T22:31:25.2623719Z dist init r=0, world=2 2023-01-11T22:31:25.2623941Z ok (3.913s) 2023-01-11T22:31:25.2624478Z test_reshard_outside_forward_backward_iteration_rank0_only_True_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63541 2023-01-11T22:31:25.2625104Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63542 2023-01-11T22:31:25.2625735Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2626174Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2626750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2627221Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2627796Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2628224Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2628797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2629264Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2629700Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2630190Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2630842Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2631534Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2632112Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2632582Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2633911Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2634698Z warnings.warn( 2023-01-11T22:31:25.2635859Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2636621Z warnings.warn( 2023-01-11T22:31:25.2637347Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2638109Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2638907Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2639666Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2640445Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2641201Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2641995Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2642747Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2643541Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2644302Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2645075Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2645908Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2646737Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2647496Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2648282Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2649045Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2649814Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2650574Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2651359Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2652112Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2653150Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2653915Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2654704Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2655457Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2656244Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2656999Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2657764Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2658614Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2659462Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2660227Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2661006Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2661772Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2662544Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2663299Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2664079Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2664825Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2665609Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2666351Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2667132Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2667884Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2668205Z dist init r=1, world=2 2023-01-11T22:31:25.2668442Z dist init r=0, world=2 2023-01-11T22:31:25.2668682Z ok (3.813s) 2023-01-11T22:31:25.2669222Z test_reshard_outside_forward_backward_iteration_rank0_only_True_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63624 2023-01-11T22:31:25.2669841Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63625 2023-01-11T22:31:25.2670456Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2670908Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2671489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2672035Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2672607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2673054Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2673722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2674178Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2674631Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2675126Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2675785Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2676459Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2676981Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2677449Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2678715Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2679498Z warnings.warn( 2023-01-11T22:31:25.2680628Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2681401Z warnings.warn( 2023-01-11T22:31:25.2682129Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2682890Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2683690Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2684437Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2685260Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2686020Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2686814Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2687645Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2688460Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:401: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2689232Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2690025Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:403: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2690779Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2691574Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:407: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2692331Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2693289Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:408: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2694087Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2694872Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2695621Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2696403Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2697147Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2697929Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:414: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2698675Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2699462Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:416: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2700307Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2701154Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2701904Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2702689Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2703443Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2704221Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:425: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2704983Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2705747Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:427: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2706507Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2707293Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2708048Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2708828Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2709588Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2710361Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:437: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2711113Z self.assertEqual(0, outer_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2711895Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:438: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2712707Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2713033Z dist init r=0, world=2 2023-01-11T22:31:25.2713266Z dist init r=1, world=2 2023-01-11T22:31:25.2713504Z ok (3.813s) 2023-01-11T22:31:25.2713998Z test_summon_from_non_fsdp (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63707 2023-01-11T22:31:25.2714516Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63708 2023-01-11T22:31:25.2715144Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2715598Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2716159Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2716637Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2717217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2717663Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2718220Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2718688Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2719139Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2719633Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2720266Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2720957Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2721480Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2721934Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2722291Z dist init r=1, world=2 2023-01-11T22:31:25.2722541Z dist init r=0, world=2 2023-01-11T22:31:25.2722784Z ok (3.311s) 2023-01-11T22:31:25.2723284Z test_summon_full_param_recursive_recurse_False_summon_outer_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63786 2023-01-11T22:31:25.2723892Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63787 2023-01-11T22:31:25.2724507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2724943Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2725524Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2725989Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2726564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2726990Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2727568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2728023Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2728472Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2728947Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2729692Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2730380Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2730614Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2730875Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2731918Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2732042Z warnings.warn( 2023-01-11T22:31:25.2733239Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2733361Z warnings.warn( 2023-01-11T22:31:25.2733478Z dist init r=0, world=2 2023-01-11T22:31:25.2733589Z dist init r=1, world=2 2023-01-11T22:31:25.2733692Z ok (3.311s) 2023-01-11T22:31:25.2734080Z test_summon_full_param_recursive_recurse_False_summon_outer_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63865 2023-01-11T22:31:25.2734305Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63866 2023-01-11T22:31:25.2734690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2734850Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2735225Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2735402Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2735783Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2735977Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2736351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2736542Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2736789Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2737024Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2737406Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2737805Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2738038Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2738268Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2739283Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2739495Z warnings.warn( 2023-01-11T22:31:25.2740575Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2740697Z warnings.warn( 2023-01-11T22:31:25.2740813Z dist init r=0, world=2 2023-01-11T22:31:25.2740925Z dist init r=1, world=2 2023-01-11T22:31:25.2741014Z ok (3.312s) 2023-01-11T22:31:25.2741402Z test_summon_full_param_recursive_recurse_False_summon_outer_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63944 2023-01-11T22:31:25.2741621Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63945 2023-01-11T22:31:25.2742002Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2742179Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2742558Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2742750Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2743115Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2743293Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2743655Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2743845Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2744089Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2744327Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2744726Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2745117Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2745344Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2745575Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2746591Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2746708Z warnings.warn( 2023-01-11T22:31:25.2747718Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2747880Z warnings.warn( 2023-01-11T22:31:25.2747995Z dist init r=1, world=2 2023-01-11T22:31:25.2748107Z dist init r=0, world=2 2023-01-11T22:31:25.2748208Z ok (3.312s) 2023-01-11T22:31:25.2748638Z test_summon_full_param_recursive_recurse_False_summon_outer_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64023 2023-01-11T22:31:25.2748866Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64024 2023-01-11T22:31:25.2749246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2749426Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2749788Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2749982Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2750349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2750524Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2750904Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2751092Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2751336Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2751572Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2751970Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2752349Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2752579Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2752806Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2753823Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2753940Z warnings.warn( 2023-01-11T22:31:25.2754943Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2755058Z warnings.warn( 2023-01-11T22:31:25.2755172Z dist init r=0, world=2 2023-01-11T22:31:25.2755285Z dist init r=1, world=2 2023-01-11T22:31:25.2755386Z ok (3.311s) 2023-01-11T22:31:25.2755771Z test_summon_full_param_recursive_recurse_True_summon_outer_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64102 2023-01-11T22:31:25.2755975Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64103 2023-01-11T22:31:25.2756348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2756592Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2756980Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2757174Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2757592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2757778Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2758166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2758339Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2758587Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2758829Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2759230Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2759623Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2759855Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2760084Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2761097Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2761216Z warnings.warn( 2023-01-11T22:31:25.2762233Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2762348Z warnings.warn( 2023-01-11T22:31:25.2762444Z dist init r=0, world=2 2023-01-11T22:31:25.2762553Z dist init r=1, world=2 2023-01-11T22:31:25.2762653Z ok (3.312s) 2023-01-11T22:31:25.2763040Z test_summon_full_param_recursive_recurse_True_summon_outer_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64181 2023-01-11T22:31:25.2763267Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64182 2023-01-11T22:31:25.2763639Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2763815Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2764195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2764387Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2764735Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2764908Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2765287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2765547Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2765795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2766036Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2766495Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2766907Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2767140Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2767351Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2768365Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2768486Z warnings.warn( 2023-01-11T22:31:25.2769499Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2769612Z warnings.warn( 2023-01-11T22:31:25.2769728Z dist init r=1, world=2 2023-01-11T22:31:25.2769838Z dist init r=0, world=2 2023-01-11T22:31:25.2769939Z ok (3.311s) 2023-01-11T22:31:25.2770321Z test_summon_full_param_recursive_recurse_True_summon_outer_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64260 2023-01-11T22:31:25.2770539Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64261 2023-01-11T22:31:25.2770900Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2771080Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2771457Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2771650Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2772017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2772197Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2772576Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2772768Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2773196Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2773425Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2773832Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2774228Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2774464Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2774784Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2775883Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2776008Z warnings.warn( 2023-01-11T22:31:25.2777028Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2777146Z warnings.warn( 2023-01-11T22:31:25.2777260Z dist init r=0, world=2 2023-01-11T22:31:25.2777371Z dist init r=1, world=2 2023-01-11T22:31:25.2777455Z ok (3.412s) 2023-01-11T22:31:25.2777842Z test_summon_full_param_recursive_recurse_True_summon_outer_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64339 2023-01-11T22:31:25.2778062Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64340 2023-01-11T22:31:25.2778435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2778611Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2778987Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2779183Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2779550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2779707Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2780091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2780285Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2780531Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2780768Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2781167Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2781565Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2781796Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2782024Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2783037Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2783137Z warnings.warn( 2023-01-11T22:31:25.2784230Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2784410Z warnings.warn( 2023-01-11T22:31:25.2784531Z dist init r=1, world=2 2023-01-11T22:31:25.2784643Z dist init r=0, world=2 2023-01-11T22:31:25.2784746Z ok (3.311s) 2023-01-11T22:31:25.2785097Z test_summon_full_param_shard_value_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64418 2023-01-11T22:31:25.2785316Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64419 2023-01-11T22:31:25.2785725Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2785889Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2786273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2786468Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2786842Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2787017Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2787424Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2787614Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2787861Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2788104Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2788483Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2788882Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2789114Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2789343Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2789455Z dist init r=0, world=2 2023-01-11T22:31:25.2789565Z dist init r=1, world=2 2023-01-11T22:31:25.2789667Z ok (3.311s) 2023-01-11T22:31:25.2790016Z test_summon_full_param_shard_value_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64497 2023-01-11T22:31:25.2790222Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64498 2023-01-11T22:31:25.2790597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2790772Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2791160Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2791351Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2791716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2791889Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2792267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2792529Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2792759Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2792999Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2793408Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2793864Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2794105Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2794334Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2794450Z dist init r=0, world=2 2023-01-11T22:31:25.2794562Z dist init r=1, world=2 2023-01-11T22:31:25.2794651Z ok (3.311s) 2023-01-11T22:31:25.2794971Z test_summon_full_param_writeback (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64576 2023-01-11T22:31:25.2795189Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64577 2023-01-11T22:31:25.2795568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2795747Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2796128Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2796320Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2796687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2796843Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2797224Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2797414Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2797659Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2797900Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2798301Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2798693Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2798923Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2799149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2799905Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2800651Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2801367Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2802172Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2802963Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2803716Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2804452Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2805189Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2805915Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2806639Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2807362Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2808171Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2808896Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2809616Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2810339Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2811128Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2812197Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:1341: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:31:25.2812412Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2023-01-11T22:31:25.2813616Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:1341: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:31:25.2813834Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2023-01-11T22:31:25.2814584Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2815319Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2816050Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2816777Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2816895Z dist init r=1, world=2 2023-01-11T22:31:25.2817005Z dist init r=0, world=2 2023-01-11T22:31:25.2817107Z ok (3.411s) 2023-01-11T22:31:25.2817480Z test_summon_full_params_equivalence_rank0_only_False_offload_to_cpu_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64655 2023-01-11T22:31:25.2817704Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64656 2023-01-11T22:31:25.2818059Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2818240Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2818623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2818815Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2819182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2819358Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2819838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2820032Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2820281Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2820501Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2820968Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2821386Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2821621Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2821851Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2821970Z dist init r=1, world=2 2023-01-11T22:31:25.2822081Z dist init r=0, world=2 2023-01-11T22:31:25.2822183Z ok (3.311s) 2023-01-11T22:31:25.2822534Z test_summon_full_params_equivalence_rank0_only_False_offload_to_cpu_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64734 2023-01-11T22:31:25.2822756Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64735 2023-01-11T22:31:25.2823135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2823312Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2823697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2823889Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2824257Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2824437Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2824816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2824990Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2825238Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2825476Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2825873Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2826269Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2826502Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2826734Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2827580Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:792: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2827696Z warnings.warn( 2023-01-11T22:31:25.2828518Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_unshard_param_utils.py:147: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2828683Z warnings.warn( 2023-01-11T22:31:25.2829518Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:792: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2829633Z warnings.warn( 2023-01-11T22:31:25.2830491Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_unshard_param_utils.py:147: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2023-01-11T22:31:25.2830613Z warnings.warn( 2023-01-11T22:31:25.2830729Z dist init r=1, world=2 2023-01-11T22:31:25.2830846Z dist init r=0, world=2 2023-01-11T22:31:25.2830951Z ok (3.311s) 2023-01-11T22:31:25.2831320Z test_summon_full_params_equivalence_rank0_only_True_offload_to_cpu_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64813 2023-01-11T22:31:25.2831538Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64814 2023-01-11T22:31:25.2831899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2832079Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2832461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2832653Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2833019Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2833198Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2833575Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2833766Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2833993Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2834234Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2834634Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2835030Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2835261Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2835495Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2835609Z dist init r=1, world=2 2023-01-11T22:31:25.2835719Z dist init r=0, world=2 2023-01-11T22:31:25.2835804Z ok (3.311s) 2023-01-11T22:31:25.2836169Z test_summon_full_params_equivalence_rank0_only_True_offload_to_cpu_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64892 2023-01-11T22:31:25.2836390Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64893 2023-01-11T22:31:25.2836762Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2836938Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2837316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2837507Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2837992Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2838170Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2838528Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2838773Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2839028Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2839270Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2839675Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2840068Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2840304Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2840531Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2840645Z dist init r=1, world=2 2023-01-11T22:31:25.2840737Z dist init r=0, world=2 2023-01-11T22:31:25.2840839Z ok (3.311s) 2023-01-11T22:31:25.2841191Z test_summon_full_params_respects_reshard_after_forward (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64971 2023-01-11T22:31:25.2841411Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64972 2023-01-11T22:31:25.2841784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2841960Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2842342Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2842535Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2842883Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2843058Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2843437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2843628Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2843872Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2844109Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2844512Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2844902Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2845130Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2845343Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2846365Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2846547Z warnings.warn( 2023-01-11T22:31:25.2847634Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2847752Z warnings.warn( 2023-01-11T22:31:25.2848340Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:271: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2848544Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2849131Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:273: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2849335Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2849907Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:271: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2850104Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2850685Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:273: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2850885Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2851465Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:279: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2851664Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2852241Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:281: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2852436Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2853203Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:279: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2853409Z outer_full_param_size, outer_param._full_param_padded.storage().size() 2023-01-11T22:31:25.2854052Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py:281: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2023-01-11T22:31:25.2854251Z self.assertEqual(0, inner_param._full_param_padded.storage().size()) 2023-01-11T22:31:25.2854423Z dist init r=1, world=2 2023-01-11T22:31:25.2854541Z dist init r=0, world=2 2023-01-11T22:31:25.2854645Z ok (3.812s) 2023-01-11T22:31:25.2854960Z test_summon_single_param (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65050 2023-01-11T22:31:25.2855181Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65051 2023-01-11T22:31:25.2855576Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2855740Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2856119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2856314Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2856683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2856859Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2857236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2857426Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2857672Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2857914Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2858295Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2858689Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2858922Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2859151Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2860286Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2860404Z warnings.warn( 2023-01-11T22:31:25.2861419Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:31:25.2861533Z warnings.warn( 2023-01-11T22:31:25.2861648Z dist init r=0, world=2 2023-01-11T22:31:25.2861759Z dist init r=1, world=2 2023-01-11T22:31:25.2861843Z ok (3.311s) 2023-01-11T22:31:25.2862021Z test_with_grads_core (__main__.TestSummonFullParams) 2023-01-11T22:31:25.2862323Z Tests the core usage of ``summon_full_params(with_grads=True)``. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65129 2023-01-11T22:31:25.2862614Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65130 2023-01-11T22:31:25.2862995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2863174Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2863604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2863805Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2864178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2864335Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2864714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2864912Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2865155Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2865393Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2865792Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2866188Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2866418Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2866645Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2866865Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2867101Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2867328Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2867555Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2867781Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2868006Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2868233Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2868459Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2868665Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2868891Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2869115Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2869338Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2869560Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2869784Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2870006Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2870227Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2870449Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2870653Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2870948Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2871172Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2871395Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2871618Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2871887Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2872113Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:31:25.2872236Z dist init r=1, world=2 2023-01-11T22:31:25.2872349Z dist init r=0, world=2 2023-01-11T22:31:25.2872432Z ok (6.116s) 2023-01-11T22:31:25.2872622Z test_with_grads_none_grads (__main__.TestSummonFullParams) 2023-01-11T22:31:25.2873079Z Tests that if all ranks' ``FlatParameter`` has ``None`` gradient, then ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65212 2023-01-11T22:31:25.2873304Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65213 2023-01-11T22:31:25.2873679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2873858Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2874242Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2874435Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2874786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2874962Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2875339Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2875534Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2875780Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2876018Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:31:25.2876420Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2876815Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:31:25.2877094Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2877302Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:31:25.2877420Z dist init r=1, world=2 2023-01-11T22:31:25.2877535Z dist init r=0, world=2 2023-01-11T22:31:25.2877637Z ok (3.511s) 2023-01-11T22:31:25.2877977Z test_summon_full_param_writeback (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65291 2023-01-11T22:31:25.2878355Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:31:25.2878532Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:31:25.2878914Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:31:25.2879088Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:31:25.2879337Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:31:25.2879733Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:31:25.2880034Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:31:25.2880577Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:31:25.2880692Z warnings.warn( 2023-01-11T22:31:25.2881499Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2882260Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2882996Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2883737Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2884465Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2885193Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2885918Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:31:25.2886056Z dist init r=0, world=1 2023-01-11T22:31:25.2886159Z ok (3.109s) 2023-01-11T22:31:25.2886180Z 2023-01-11T22:31:25.2886435Z ---------------------------------------------------------------------- 2023-01-11T22:31:25.2886554Z Ran 52 tests in 181.638s 2023-01-11T22:31:25.2886576Z 2023-01-11T22:31:25.2886670Z OK 2023-01-11T22:31:25.2886689Z 2023-01-11T22:31:25.2886815Z Generating XML reports... 2023-01-11T22:31:25.2887294Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params/TEST-TestSummonFullParams-20230111222823.xml 2023-01-11T22:31:25.2887799Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params/TEST-TestSummonFullParamsNoShard-20230111222823.xml 2023-01-11T22:31:25.2887819Z 2023-01-11T22:31:25.2888220Z ##[endgroup] 2023-01-11T22:31:25.2888718Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_summon_full_params (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_summon_full_params_xc9_rtc3) 2023-01-11T22:31:25.2888758Z 2023-01-11T22:31:25.2888992Z Running distributed/test_c10d_gloo ... [2023-01-11 22:31:25.210995] 2023-01-11T22:31:25.2889481Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_c10d_gloo.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:31:25.211363] 2023-01-11T22:44:49.4160687Z 2023-01-11T22:44:49.4161131Z Expand the folded group to see the log file of distributed/test_c10d_gloo 2023-01-11T22:44:49.4164312Z ##[group]PRINTING LOG FILE of distributed/test_c10d_gloo (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_gloo_6tcbezuo) 2023-01-11T22:44:49.4164920Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpibwqabz0 2023-01-11T22:44:49.4165990Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpibwqabz0/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4167361Z , <__main__.CommTest testMethod=test_broadcast_coalesced_gloo_cuda>, <__main__.CommTest testMethod=test_gloo_barrier_device_ids>, <__main__.CommTest testMethod=test_gloo_rank_membership>, <__main__.CommTest testMethod=test_gloo_warn_not_in_group>, <__main__.CommTest testMethod=test_sequence_num_incremented_gloo_default>, <__main__.CommTest testMethod=test_sequence_num_incremented_gloo_subgroup>, <__main__.CommTest testMethod=test_sequence_num_set_default_pg_gloo>, <__main__.CommTest testMethod=test_sequence_num_set_gloo_new_group>, <__main__.CommTest testMethod=test_tensor_dtype_complex>, <__main__.CommTest testMethod=test_tensor_dtype_mismatch>]> 2023-01-11T22:44:49.4168742Z test_broadcast_coalesced_gloo_cpu (__main__.CommTest) 2023-01-11T22:44:49.4169284Z test_broadcast_coalesced_gloo_cuda (__main__.CommTest) 2023-01-11T22:44:49.4169908Z test_gloo_barrier_device_ids (__main__.CommTest) 2023-01-11T22:44:49.4170477Z test_gloo_rank_membership (__main__.CommTest) 2023-01-11T22:44:49.4170829Z test_gloo_warn_not_in_group (__main__.CommTest) 2023-01-11T22:44:49.4171198Z test_sequence_num_incremented_gloo_default (__main__.CommTest) 2023-01-11T22:44:49.4171567Z test_sequence_num_incremented_gloo_subgroup (__main__.CommTest) 2023-01-11T22:44:49.4171938Z test_sequence_num_set_default_pg_gloo (__main__.CommTest) 2023-01-11T22:44:49.4172304Z test_sequence_num_set_gloo_new_group (__main__.CommTest) 2023-01-11T22:44:49.4173263Z test_tensor_dtype_complex (__main__.CommTest) 2023-01-11T22:44:49.4174732Z test_tensor_dtype_mismatch (__main__.CommTest) 2023-01-11T22:44:49.4176422Z , <__main__.CompilerTest testMethod=test_allgather_work_wait_gpu>, <__main__.CompilerTest testMethod=test_allreduce_work_wait_cpu>, <__main__.CompilerTest testMethod=test_allreduce_work_wait_gpu>, <__main__.CompilerTest testMethod=test_broadcast_work_wait_cpu>, <__main__.CompilerTest testMethod=test_broadcast_work_wait_gpu>, <__main__.CompilerTest testMethod=test_consecutive_comm_work_wait_cpu>, <__main__.CompilerTest testMethod=test_consecutive_comm_work_wait_gpu>, <__main__.CompilerTest testMethod=test_nested_comm_tensor_wrapping>, <__main__.CompilerTest testMethod=test_scatter_work_wait_cpu>, <__main__.CompilerTest testMethod=test_scatter_work_wait_gpu>]> 2023-01-11T22:44:49.4178125Z test_allgather_work_wait_cpu (__main__.CompilerTest) 2023-01-11T22:44:49.4178490Z test_allgather_work_wait_gpu (__main__.CompilerTest) 2023-01-11T22:44:49.4178850Z test_allreduce_work_wait_cpu (__main__.CompilerTest) 2023-01-11T22:44:49.4179422Z test_allreduce_work_wait_gpu (__main__.CompilerTest) 2023-01-11T22:44:49.4179813Z test_broadcast_work_wait_cpu (__main__.CompilerTest) 2023-01-11T22:44:49.4180306Z test_broadcast_work_wait_gpu (__main__.CompilerTest) 2023-01-11T22:44:49.4180672Z test_consecutive_comm_work_wait_cpu (__main__.CompilerTest) 2023-01-11T22:44:49.4181048Z test_consecutive_comm_work_wait_gpu (__main__.CompilerTest) 2023-01-11T22:44:49.4181419Z test_nested_comm_tensor_wrapping (__main__.CompilerTest) 2023-01-11T22:44:49.4181774Z test_scatter_work_wait_cpu (__main__.CompilerTest) 2023-01-11T22:44:49.4182100Z test_scatter_work_wait_gpu (__main__.CompilerTest) 2023-01-11T22:44:49.4190766Z , <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_dynamic_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_future_passing_cpu>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_future_passing_gpu_gloo>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_register_just_once>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_sparse_gradients>, <__main__.DistributedDataParallelTest testMethod=test_ddp_invalid_comm_hook_init>, <__main__.DistributedDataParallelTest testMethod=test_ddp_invalid_comm_hook_return_type>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_when_unused_parameters_empty>, <__main__.DistributedDataParallelTest testMethod=test_global_local_unused_params_grad>, <__main__.DistributedDataParallelTest testMethod=test_global_local_unused_params_grad_with_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_global_local_unused_params_grad_with_static_graph>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_1gpu_module_device_ids_integer_list>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_1gpu_module_device_ids_torch_device_list>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_2gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_4gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_cpu_module>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_cpu_module_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_ignored_output>, <__main__.DistributedDataParallelTest testMethod=test_ignored_output_with_unused_parameters>, <__main__.DistributedDataParallelTest testMethod=test_ignored_sharded_tensor>, <__main__.DistributedDataParallelTest testMethod=test_invalid_powerSGD_state>, <__main__.DistributedDataParallelTest testMethod=test_save_load_checkpoint>, <__main__.DistributedDataParallelTest testMethod=test_sparse_gradients>, <__main__.DistributedDataParallelTest testMethod=test_sparse_gradients_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_empty_input>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_only_empty_input>]> 2023-01-11T22:44:49.4196849Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4197683Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4198519Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4199219Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4199744Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4201008Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4201782Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4202679Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4203478Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4204375Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4204923Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4205419Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4205935Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4206430Z test_ddp_comm_hook_future_passing_cpu (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4206903Z test_ddp_comm_hook_future_passing_gpu_gloo (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4207344Z test_ddp_comm_hook_register_just_once (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4207800Z test_ddp_comm_hook_sparse_gradients (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4208252Z test_ddp_invalid_comm_hook_init (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4208709Z test_ddp_invalid_comm_hook_return_type (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4209184Z test_find_unused_parameters_when_unused_parameters_empty (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4209697Z test_global_local_unused_params_grad (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4210477Z test_global_local_unused_params_grad_with_grad_is_view (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4211392Z test_global_local_unused_params_grad_with_static_graph (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4212271Z test_gloo_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4213635Z test_gloo_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4214317Z test_gloo_backend_2gpu_module (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4214745Z test_gloo_backend_4gpu_module (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4215478Z test_gloo_backend_cpu_module (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4216118Z test_gloo_backend_cpu_module_grad_is_view (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4216809Z test_ignored_output (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4217450Z test_ignored_output_with_unused_parameters (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4218133Z test_ignored_sharded_tensor (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4218829Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4219491Z test_save_load_checkpoint (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4220238Z test_sparse_gradients (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4220681Z test_sparse_gradients_grad_is_view (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4221107Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4221561Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4222929Z , <__main__.GlooProcessGroupWithDispatchedCollectivesTests testMethod=test_allgather_coalesced>, <__main__.GlooProcessGroupWithDispatchedCollectivesTests testMethod=test_allreduce_coalesced>, <__main__.GlooProcessGroupWithDispatchedCollectivesTests testMethod=test_collectives>, <__main__.GlooProcessGroupWithDispatchedCollectivesTests testMethod=test_monitored_barrier>]> 2023-01-11T22:44:49.4224479Z test_all_to_all_single (__main__.GlooProcessGroupWithDispatchedCollectivesTests) 2023-01-11T22:44:49.4225073Z test_allgather_coalesced (__main__.GlooProcessGroupWithDispatchedCollectivesTests) 2023-01-11T22:44:49.4225935Z test_allreduce_coalesced (__main__.GlooProcessGroupWithDispatchedCollectivesTests) 2023-01-11T22:44:49.4226891Z test_collectives (__main__.GlooProcessGroupWithDispatchedCollectivesTests) 2023-01-11T22:44:49.4227569Z test_monitored_barrier (__main__.GlooProcessGroupWithDispatchedCollectivesTests) 2023-01-11T22:44:49.4228001Z 2023-01-11T22:44:49.4235206Z , <__main__.ProcessGroupGlooTest testMethod=test_allgather_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_coalesced_async>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_coalesced_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_noncontiguous_input>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_stress>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics_cuda_using_work_api>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics_using_work_api>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_async>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_basics>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_checks_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_stress>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_stress>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_barrier_implies_wait>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_basics>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_checks>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_stress>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_empty_tensors>, <__main__.ProcessGroupGlooTest testMethod=test_gather_basics>, <__main__.ProcessGroupGlooTest testMethod=test_gather_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_gather_checks>, <__main__.ProcessGroupGlooTest testMethod=test_gather_noncontiguous_input>, <__main__.ProcessGroupGlooTest testMethod=test_gather_stress>, <__main__.ProcessGroupGlooTest testMethod=test_gather_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_multi_device_constructor>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_basics>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_checks>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_stress>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_round_robin>, <__main__.ProcessGroupGlooTest testMethod=test_round_robin_create_destroy>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_basics>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_checks>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_stress>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_send_recv_all_to_all>, <__main__.ProcessGroupGlooTest testMethod=test_sparse_allreduce_basics>, <__main__.ProcessGroupGlooTest testMethod=test_sparse_allreduce_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_sparse_allreduce_checks>]> 2023-01-11T22:44:49.4242688Z test_allgather_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4243231Z test_allgather_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4243746Z test_allgather_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4244142Z test_allgather_coalesced_async (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4244629Z test_allgather_coalesced_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4245266Z test_allgather_noncontiguous_input (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4245665Z test_allgather_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4246187Z test_allgather_stress_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4246696Z test_allreduce_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4247213Z test_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4247642Z test_allreduce_basics_cuda_using_work_api (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4248154Z test_allreduce_basics_using_work_api (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4248817Z test_allreduce_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4249411Z test_allreduce_coalesced_async (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4249966Z test_allreduce_coalesced_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4250646Z test_allreduce_coalesced_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4251297Z test_allreduce_coalesced_checks_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4252038Z test_allreduce_coalesced_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4252648Z test_allreduce_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4253712Z test_allreduce_stress_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4254282Z test_barrier_implies_wait (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4254678Z test_broadcast_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4255059Z test_broadcast_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4255423Z test_broadcast_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4256079Z test_broadcast_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4256532Z test_broadcast_stress_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4256886Z test_empty_tensors (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4257260Z test_gather_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4257705Z test_gather_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4258215Z test_gather_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4258615Z test_gather_noncontiguous_input (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4259001Z test_gather_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4259358Z test_gather_stress_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4260069Z test_multi_device_constructor (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4260475Z test_reduce_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4260852Z test_reduce_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4261200Z test_reduce_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4261664Z test_reduce_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4262132Z test_reduce_stress_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4262485Z test_round_robin (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4262873Z test_round_robin_create_destroy (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4263256Z test_scatter_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4263611Z test_scatter_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4263984Z test_scatter_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4264346Z test_scatter_stress (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4264717Z test_scatter_stress_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4265211Z test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4265763Z test_sparse_allreduce_basics (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4266169Z test_sparse_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4266561Z test_sparse_allreduce_checks (__main__.ProcessGroupGlooTest) 2023-01-11T22:44:49.4267725Z , <__main__.ReducerTest testMethod=test_forward_backward_optimizer>, <__main__.ReducerTest testMethod=test_forward_backward_unused_parameters>, <__main__.ReducerTest testMethod=test_multi_dtype_multi_bucket>, <__main__.ReducerTest testMethod=test_multi_dtype_single_bucket>, <__main__.ReducerTest testMethod=test_single_dtype_single_bucket>]> 2023-01-11T22:44:49.4268731Z test_forward_backward (__main__.ReducerTest) 2023-01-11T22:44:49.4269090Z test_forward_backward_optimizer (__main__.ReducerTest) 2023-01-11T22:44:49.4269477Z test_forward_backward_unused_parameters (__main__.ReducerTest) 2023-01-11T22:44:49.4270029Z test_multi_dtype_multi_bucket (__main__.ReducerTest) 2023-01-11T22:44:49.4270395Z test_multi_dtype_single_bucket (__main__.ReducerTest) 2023-01-11T22:44:49.4270754Z test_single_dtype_single_bucket (__main__.ReducerTest) 2023-01-11T22:44:49.4271287Z ]> 2023-01-11T22:44:49.4271742Z test_logging_init (__main__.RendezvousEnvTest) 2023-01-11T22:44:49.4272080Z 2023-01-11T22:44:49.4272511Z ]> 2023-01-11T22:44:49.4273107Z test_default_store_timeout_gloo (__main__.TimeoutTest) 2023-01-11T22:44:49.4273829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4274293Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4274874Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4275359Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4275834Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3y2d99tj 2023-01-11T22:44:49.4276392Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3y2d99tj/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4276703Z 2023-01-11T22:44:49.4276798Z Running tests... 2023-01-11T22:44:49.4277220Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4277763Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4278239Z test_broadcast_coalesced_gloo_cpu (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4278714Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65399 2023-01-11T22:44:49.4279179Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65400 2023-01-11T22:44:49.4279796Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4280241Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4280834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4281326Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4281912Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4282346Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4282931Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4283503Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4283957Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps5awlmn1 2023-01-11T22:44:49.4284509Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps5awlmn1/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4285033Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4285594Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj6sjdr7k 2023-01-11T22:44:49.4286129Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj6sjdr7k/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4286650Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4287147Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4287653Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4288323Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4289017Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4289414Z ok (3.900s) 2023-01-11T22:44:49.4289569Z 2023-01-11T22:44:49.4289845Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4290165Z Ran 1 test in 3.900s 2023-01-11T22:44:49.4290329Z 2023-01-11T22:44:49.4290437Z OK 2023-01-11T22:44:49.4290573Z 2023-01-11T22:44:49.4290705Z Generating XML reports... 2023-01-11T22:44:49.4291234Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223128.xml 2023-01-11T22:44:49.4291912Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4292376Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4293508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4294201Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4294679Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmca0ps2_ 2023-01-11T22:44:49.4295231Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmca0ps2_/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4295536Z 2023-01-11T22:44:49.4295628Z Running tests... 2023-01-11T22:44:49.4296052Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4296588Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4297082Z test_broadcast_coalesced_gloo_cuda (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4297541Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65510 2023-01-11T22:44:49.4297994Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65511 2023-01-11T22:44:49.4298609Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4299045Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4299632Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4300114Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4300700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4301131Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4301708Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4302318Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4302766Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfzbqrnx2 2023-01-11T22:44:49.4303313Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfzbqrnx2/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4303912Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7cm74bs2 2023-01-11T22:44:49.4304463Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7cm74bs2/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4304955Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4305438Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4305930Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4306434Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4307085Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4307774Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4308172Z ok (4.819s) 2023-01-11T22:44:49.4308328Z 2023-01-11T22:44:49.4308582Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4308921Z Ran 1 test in 4.820s 2023-01-11T22:44:49.4309085Z 2023-01-11T22:44:49.4309181Z OK 2023-01-11T22:44:49.4309319Z 2023-01-11T22:44:49.4309449Z Generating XML reports... 2023-01-11T22:44:49.4309976Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223135.xml 2023-01-11T22:44:49.4310647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4311114Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4311679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4312161Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4312629Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp15cezmqm 2023-01-11T22:44:49.4313162Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp15cezmqm/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4313464Z 2023-01-11T22:44:49.4313557Z Running tests... 2023-01-11T22:44:49.4313977Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4314507Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4314977Z test_gloo_barrier_device_ids (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4315423Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65623 2023-01-11T22:44:49.4315875Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65624 2023-01-11T22:44:49.4316519Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4316960Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4317548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4318025Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4318606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4319035Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4319687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4320172Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4320626Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjp6uxu2r 2023-01-11T22:44:49.4321172Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjp6uxu2r/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4321769Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy4sio37s 2023-01-11T22:44:49.4322317Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy4sio37s/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4322813Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4323289Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4323781Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4324287Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4324948Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4325689Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4326108Z ok (3.928s) 2023-01-11T22:44:49.4326260Z 2023-01-11T22:44:49.4326535Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4326848Z Ran 1 test in 3.928s 2023-01-11T22:44:49.4327012Z 2023-01-11T22:44:49.4327110Z OK 2023-01-11T22:44:49.4327248Z 2023-01-11T22:44:49.4327377Z Generating XML reports... 2023-01-11T22:44:49.4327923Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223142.xml 2023-01-11T22:44:49.4328573Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4329025Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4329603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4330057Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4330528Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9xv22kwe 2023-01-11T22:44:49.4331066Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9xv22kwe/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4331366Z 2023-01-11T22:44:49.4331478Z Running tests... 2023-01-11T22:44:49.4331868Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4332397Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4333528Z test_gloo_rank_membership (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4334167Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65732 2023-01-11T22:44:49.4334600Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65733 2023-01-11T22:44:49.4335220Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4335677Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4336241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4336730Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4337317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4337769Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4338445Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4338929Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4339587Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf5e0prvk 2023-01-11T22:44:49.4340216Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf5e0prvk/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4340747Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_nhyodze 2023-01-11T22:44:49.4341285Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_nhyodze/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4341798Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4342275Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4342751Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4343247Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4343925Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4344602Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4345139Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:44:49.4345638Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:44:49.4346296Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:44:49.4346987Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:44:49.4347397Z ok (4.035s) 2023-01-11T22:44:49.4347556Z 2023-01-11T22:44:49.4347853Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4348196Z Ran 1 test in 4.035s 2023-01-11T22:44:49.4348362Z 2023-01-11T22:44:49.4348439Z OK 2023-01-11T22:44:49.4348577Z 2023-01-11T22:44:49.4348708Z Generating XML reports... 2023-01-11T22:44:49.4349262Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223148.xml 2023-01-11T22:44:49.4349932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4350366Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4350945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4351424Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4351872Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg2058xt3 2023-01-11T22:44:49.4352411Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg2058xt3/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4352710Z 2023-01-11T22:44:49.4352824Z Running tests... 2023-01-11T22:44:49.4353237Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4353753Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4354228Z test_gloo_warn_not_in_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4354690Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65844 2023-01-11T22:44:49.4355159Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65845 2023-01-11T22:44:49.4355775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4356323Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4356960Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4357437Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4358049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4358525Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4359123Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4359617Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4360072Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp49bvzttp 2023-01-11T22:44:49.4360627Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp49bvzttp/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4361156Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsn7464a5 2023-01-11T22:44:49.4361689Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsn7464a5/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4362176Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4362650Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4363136Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4363628Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4364270Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4365071Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4365604Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:44:49.4366091Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:44:49.4366725Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:44:49.4367409Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:44:49.4367804Z ok (4.839s) 2023-01-11T22:44:49.4367954Z 2023-01-11T22:44:49.4368208Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4368535Z Ran 1 test in 4.839s 2023-01-11T22:44:49.4368696Z 2023-01-11T22:44:49.4368797Z OK 2023-01-11T22:44:49.4368933Z 2023-01-11T22:44:49.4369060Z Generating XML reports... 2023-01-11T22:44:49.4369585Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223154.xml 2023-01-11T22:44:49.4370244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4370693Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4371252Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4371723Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4372189Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp75mw2t97 2023-01-11T22:44:49.4372724Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp75mw2t97/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4373603Z 2023-01-11T22:44:49.4373828Z Running tests... 2023-01-11T22:44:49.4374256Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4374786Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4375266Z test_sequence_num_incremented_gloo_default (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4375745Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65958 2023-01-11T22:44:49.4376265Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65959 2023-01-11T22:44:49.4376902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4377333Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4377910Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4378385Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4378946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4379393Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4379965Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4380436Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4380964Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppwkw7okx 2023-01-11T22:44:49.4381503Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppwkw7okx/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4382034Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpws1r7hg4 2023-01-11T22:44:49.4382568Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpws1r7hg4/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4383062Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4383533Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4384017Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4384496Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4385163Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4385854Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4386380Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:44:49.4386849Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:44:49.4387507Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:44:49.4388193Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:44:49.4388587Z ok (4.835s) 2023-01-11T22:44:49.4388721Z 2023-01-11T22:44:49.4388998Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4389332Z Ran 1 test in 4.835s 2023-01-11T22:44:49.4389495Z 2023-01-11T22:44:49.4389591Z OK 2023-01-11T22:44:49.4389727Z 2023-01-11T22:44:49.4389834Z Generating XML reports... 2023-01-11T22:44:49.4390372Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223202.xml 2023-01-11T22:44:49.4391032Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4391560Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4392129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4392598Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4393062Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptiahzjbi 2023-01-11T22:44:49.4393638Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptiahzjbi/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4393950Z 2023-01-11T22:44:49.4394064Z Running tests... 2023-01-11T22:44:49.4394478Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4395011Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4395488Z test_sequence_num_incremented_gloo_subgroup (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4395971Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66075 2023-01-11T22:44:49.4396424Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66076 2023-01-11T22:44:49.4397012Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4397464Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4398044Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4398515Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4399080Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4399529Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4400103Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4400570Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4401017Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf0kl2ayw 2023-01-11T22:44:49.4401555Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf0kl2ayw/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4402087Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7y5iufx4 2023-01-11T22:44:49.4402598Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7y5iufx4/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4403104Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4403574Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4403967Z skip: Need at least 4 CUDA devices (3.929s) 2023-01-11T22:44:49.4404149Z 2023-01-11T22:44:49.4404429Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4404759Z Ran 1 test in 3.929s 2023-01-11T22:44:49.4404924Z 2023-01-11T22:44:49.4405036Z OK (skipped=1) 2023-01-11T22:44:49.4405191Z 2023-01-11T22:44:49.4405298Z Generating XML reports... 2023-01-11T22:44:49.4405838Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223209.xml 2023-01-11T22:44:49.4406507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4406957Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4407514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4407981Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4408446Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdfmx1248 2023-01-11T22:44:49.4409042Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdfmx1248/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4409340Z 2023-01-11T22:44:49.4409450Z Running tests... 2023-01-11T22:44:49.4409865Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4410405Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4410929Z test_sequence_num_set_default_pg_gloo (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4411411Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66178 2023-01-11T22:44:49.4411859Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66179 2023-01-11T22:44:49.4412468Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4413206Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4413808Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4414277Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4414838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4415286Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4415858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4416321Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4416767Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphxez5uy8 2023-01-11T22:44:49.4417308Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphxez5uy8/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4417844Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_i24a6wq 2023-01-11T22:44:49.4418353Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_i24a6wq/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4418856Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4419330Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4419818Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4420292Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4420952Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4421640Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4422040Z ok (3.924s) 2023-01-11T22:44:49.4422173Z 2023-01-11T22:44:49.4422444Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4422778Z Ran 1 test in 3.925s 2023-01-11T22:44:49.4422942Z 2023-01-11T22:44:49.4423039Z OK 2023-01-11T22:44:49.4423174Z 2023-01-11T22:44:49.4423282Z Generating XML reports... 2023-01-11T22:44:49.4423828Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223215.xml 2023-01-11T22:44:49.4424490Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4424939Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4425491Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4425989Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4426567Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1wv_kmi7 2023-01-11T22:44:49.4427086Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1wv_kmi7/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4427388Z 2023-01-11T22:44:49.4427497Z Running tests... 2023-01-11T22:44:49.4427915Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4428517Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4429001Z test_sequence_num_set_gloo_new_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4429472Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66287 2023-01-11T22:44:49.4429922Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66288 2023-01-11T22:44:49.4430521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4430980Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4431555Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4432018Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4432577Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4433031Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4433600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4434045Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4434501Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnwpu71zv 2023-01-11T22:44:49.4435043Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnwpu71zv/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4435578Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5uw1e7r1 2023-01-11T22:44:49.4436092Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5uw1e7r1/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4436595Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4437067Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4437552Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4438026Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4438680Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4439367Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4439886Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:44:49.4440375Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:44:49.4441027Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:44:49.4441715Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:44:49.4442092Z ok (4.031s) 2023-01-11T22:44:49.4442242Z 2023-01-11T22:44:49.4442511Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4442837Z Ran 1 test in 4.031s 2023-01-11T22:44:49.4443000Z 2023-01-11T22:44:49.4443094Z OK 2023-01-11T22:44:49.4443210Z 2023-01-11T22:44:49.4443336Z Generating XML reports... 2023-01-11T22:44:49.4443960Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223221.xml 2023-01-11T22:44:49.4444628Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4445059Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4445684Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4446163Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4446626Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbqhaeswd 2023-01-11T22:44:49.4447149Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbqhaeswd/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4447451Z 2023-01-11T22:44:49.4447561Z Running tests... 2023-01-11T22:44:49.4447971Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4448488Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4448957Z test_tensor_dtype_complex (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4449418Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66402 2023-01-11T22:44:49.4449867Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66403 2023-01-11T22:44:49.4450457Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4450907Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4451485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4451938Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4452510Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4453245Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4453839Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4454292Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4454760Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi1qa2nmv 2023-01-11T22:44:49.4455299Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi1qa2nmv/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4455829Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprbx1e52q 2023-01-11T22:44:49.4456347Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprbx1e52q/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4456857Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4457334Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4457799Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4458296Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4458959Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4459645Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4460019Z ok (4.027s) 2023-01-11T22:44:49.4460170Z 2023-01-11T22:44:49.4460440Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4460769Z Ran 1 test in 4.027s 2023-01-11T22:44:49.4460933Z 2023-01-11T22:44:49.4461009Z OK 2023-01-11T22:44:49.4461265Z 2023-01-11T22:44:49.4461393Z Generating XML reports... 2023-01-11T22:44:49.4461940Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223227.xml 2023-01-11T22:44:49.4462601Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4463070Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4463728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4464212Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4464658Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiaqnxitu 2023-01-11T22:44:49.4465198Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiaqnxitu/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4465503Z 2023-01-11T22:44:49.4465614Z Running tests... 2023-01-11T22:44:49.4466036Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4466548Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4467019Z test_tensor_dtype_mismatch (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4467479Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66511 2023-01-11T22:44:49.4467916Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66512 2023-01-11T22:44:49.4468521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4468971Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4469544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4469995Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4470575Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4471017Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4471590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4472039Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4472505Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph19t64j3 2023-01-11T22:44:49.4473044Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph19t64j3/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4473556Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplmp58v7j 2023-01-11T22:44:49.4474089Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplmp58v7j/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4474604Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4475075Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4475543Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4476033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4476692Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4477381Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4478401Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.4479098Z warnings.warn( 2023-01-11T22:44:49.4479981Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.4480601Z warnings.warn( 2023-01-11T22:44:49.4481495Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.4482123Z warnings.warn( 2023-01-11T22:44:49.4482981Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.4483595Z warnings.warn( 2023-01-11T22:44:49.4483816Z ok (3.923s) 2023-01-11T22:44:49.4483964Z 2023-01-11T22:44:49.4484238Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4484570Z Ran 1 test in 3.923s 2023-01-11T22:44:49.4484733Z 2023-01-11T22:44:49.4484829Z OK 2023-01-11T22:44:49.4484950Z 2023-01-11T22:44:49.4485076Z Generating XML reports... 2023-01-11T22:44:49.4485621Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223234.xml 2023-01-11T22:44:49.4486285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4486720Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4487293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4487768Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4488233Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2iz4prn2 2023-01-11T22:44:49.4488757Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2iz4prn2/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4489061Z 2023-01-11T22:44:49.4489171Z Running tests... 2023-01-11T22:44:49.4489583Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4490097Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4490581Z test_allgather_work_wait_cpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4491049Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66620 2023-01-11T22:44:49.4491497Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66621 2023-01-11T22:44:49.4492091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4492540Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4493306Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4493766Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4494349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4494793Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4495363Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4495810Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4496379Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4w73jrs8 2023-01-11T22:44:49.4496916Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4w73jrs8/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4497448Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqgpi664g 2023-01-11T22:44:49.4497962Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqgpi664g/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4498537Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4499026Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4499496Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4499989Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4500653Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4501343Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4502253Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4502977Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4503826Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4504537Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4504854Z ok (4.029s) 2023-01-11T22:44:49.4505004Z 2023-01-11T22:44:49.4505276Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4505608Z Ran 1 test in 4.029s 2023-01-11T22:44:49.4505771Z 2023-01-11T22:44:49.4505867Z OK 2023-01-11T22:44:49.4505983Z 2023-01-11T22:44:49.4506109Z Generating XML reports... 2023-01-11T22:44:49.4506670Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223240.xml 2023-01-11T22:44:49.4507344Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4507777Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4508347Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4508814Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4509284Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxtv0udt2 2023-01-11T22:44:49.4509806Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxtv0udt2/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4510109Z 2023-01-11T22:44:49.4510219Z Running tests... 2023-01-11T22:44:49.4510627Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4511139Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4511620Z test_allgather_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4512084Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66729 2023-01-11T22:44:49.4512531Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66730 2023-01-11T22:44:49.4513117Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4513641Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4514220Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4514673Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4515300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4515753Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4516332Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4516777Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4517242Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbmi6jf9i 2023-01-11T22:44:49.4517781Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbmi6jf9i/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4518317Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqlg7ljt1 2023-01-11T22:44:49.4518831Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqlg7ljt1/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4519341Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4519820Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4520286Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4520778Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4521436Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4522122Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4523032Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4523760Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4524619Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4525337Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4525651Z ok (4.816s) 2023-01-11T22:44:49.4525801Z 2023-01-11T22:44:49.4526073Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4526442Z Ran 1 test in 4.816s 2023-01-11T22:44:49.4526605Z 2023-01-11T22:44:49.4526700Z OK 2023-01-11T22:44:49.4526818Z 2023-01-11T22:44:49.4526946Z Generating XML reports... 2023-01-11T22:44:49.4527507Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223246.xml 2023-01-11T22:44:49.4528190Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4528629Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4529210Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4529682Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4530152Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5mfm7mf4 2023-01-11T22:44:49.4530745Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5mfm7mf4/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4531047Z 2023-01-11T22:44:49.4531161Z Running tests... 2023-01-11T22:44:49.4531575Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4532111Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4532628Z test_allreduce_work_wait_cpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4533338Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66840 2023-01-11T22:44:49.4533788Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66841 2023-01-11T22:44:49.4534390Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4534845Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4535425Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4535899Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4536457Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4536907Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4537488Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4537943Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4538407Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpna2rxz00 2023-01-11T22:44:49.4538950Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpna2rxz00/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4539459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4539949Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvfzhjx9r 2023-01-11T22:44:49.4540489Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvfzhjx9r/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4540998Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4541490Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4541970Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4542631Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4543318Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4544249Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4544952Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4545805Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4546509Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4547351Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4548183Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4549007Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4549773Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4550111Z ok (3.944s) 2023-01-11T22:44:49.4550263Z 2023-01-11T22:44:49.4550542Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4550858Z Ran 1 test in 3.944s 2023-01-11T22:44:49.4551018Z 2023-01-11T22:44:49.4551115Z OK 2023-01-11T22:44:49.4551249Z 2023-01-11T22:44:49.4551375Z Generating XML reports... 2023-01-11T22:44:49.4551922Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223253.xml 2023-01-11T22:44:49.4552600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4553058Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4553638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4554086Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4554561Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvwcyxwbo 2023-01-11T22:44:49.4555109Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvwcyxwbo/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4555413Z 2023-01-11T22:44:49.4555504Z Running tests... 2023-01-11T22:44:49.4555919Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4556458Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4556956Z test_allreduce_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4557410Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66949 2023-01-11T22:44:49.4557861Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66950 2023-01-11T22:44:49.4558479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4558915Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4559500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4559977Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4560561Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4560988Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4561571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4562040Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4562512Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1jnwz7m1 2023-01-11T22:44:49.4563040Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1jnwz7m1/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4563583Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprb2atv4u 2023-01-11T22:44:49.4564120Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprb2atv4u/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4564609Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4565083Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4565650Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4566149Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4566797Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4567573Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4568516Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4569242Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4570074Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4570801Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4571656Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4572366Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4573406Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4574117Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4574453Z ok (4.934s) 2023-01-11T22:44:49.4574609Z 2023-01-11T22:44:49.4574881Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4575219Z Ran 1 test in 4.934s 2023-01-11T22:44:49.4575362Z 2023-01-11T22:44:49.4575460Z OK 2023-01-11T22:44:49.4575600Z 2023-01-11T22:44:49.4575728Z Generating XML reports... 2023-01-11T22:44:49.4576294Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223300.xml 2023-01-11T22:44:49.4576958Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4577416Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4577997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4578477Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4578927Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu8900tcc 2023-01-11T22:44:49.4579474Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu8900tcc/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4579777Z 2023-01-11T22:44:49.4579889Z Running tests... 2023-01-11T22:44:49.4580284Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4580826Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4581314Z test_broadcast_work_wait_cpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4581789Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67060 2023-01-11T22:44:49.4582224Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67061 2023-01-11T22:44:49.4582840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4583397Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4583986Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4584442Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4585095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4585559Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4586127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4586597Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4587066Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu1c7yjbw 2023-01-11T22:44:49.4587618Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu1c7yjbw/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4588134Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzh9o_wjz 2023-01-11T22:44:49.4588673Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzh9o_wjz/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4589187Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4589663Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4590134Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4590630Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4591294Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4591969Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4592903Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4593630Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4594490Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4595195Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4595502Z ok (4.032s) 2023-01-11T22:44:49.4595659Z 2023-01-11T22:44:49.4595932Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4596267Z Ran 1 test in 4.032s 2023-01-11T22:44:49.4596430Z 2023-01-11T22:44:49.4596508Z OK 2023-01-11T22:44:49.4596644Z 2023-01-11T22:44:49.4596773Z Generating XML reports... 2023-01-11T22:44:49.4597336Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223307.xml 2023-01-11T22:44:49.4598022Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4598457Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4599039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4599512Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4599983Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcxcfi9xt 2023-01-11T22:44:49.4600580Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcxcfi9xt/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4600885Z 2023-01-11T22:44:49.4601000Z Running tests... 2023-01-11T22:44:49.4601421Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4601935Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4602476Z test_broadcast_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4602962Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67169 2023-01-11T22:44:49.4603410Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67170 2023-01-11T22:44:49.4604006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4604463Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4605049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4605502Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4606087Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4606564Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4607142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4607592Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4608062Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp45wrqrx_ 2023-01-11T22:44:49.4608605Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp45wrqrx_/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4609145Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplu7tp8a3 2023-01-11T22:44:49.4609662Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplu7tp8a3/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4610174Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4610649Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4611120Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4611619Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4612274Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4613201Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4614127Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4614852Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4615712Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4616429Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4616757Z ok (4.849s) 2023-01-11T22:44:49.4616890Z 2023-01-11T22:44:49.4617163Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4617592Z Ran 1 test in 4.849s 2023-01-11T22:44:49.4617760Z 2023-01-11T22:44:49.4617859Z OK 2023-01-11T22:44:49.4617997Z 2023-01-11T22:44:49.4618107Z Generating XML reports... 2023-01-11T22:44:49.4618674Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223313.xml 2023-01-11T22:44:49.4619348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4619869Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4620456Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4620931Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4621407Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw8k9m1r7 2023-01-11T22:44:49.4621927Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw8k9m1r7/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4622235Z 2023-01-11T22:44:49.4622352Z Running tests... 2023-01-11T22:44:49.4622760Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4623292Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4623770Z test_consecutive_comm_work_wait_cpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4624253Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67280 2023-01-11T22:44:49.4624713Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67281 2023-01-11T22:44:49.4625303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4625756Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4626335Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4626844Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4627410Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4627862Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4628443Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4628913Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4629361Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkfrnul5m 2023-01-11T22:44:49.4629887Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf36b59nk 2023-01-11T22:44:49.4630421Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkfrnul5m/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4630954Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf36b59nk/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4631465Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4631939Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4632430Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4632911Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4633571Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4634267Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4635199Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4635984Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4636889Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4637616Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4638462Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant2 target _tensor_constant2 _tensor_constant2 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4639169Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4640008Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant3 target _tensor_constant3 _tensor_constant3 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4640718Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4641569Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4642277Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4643096Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4643816Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4644664Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant2 target _tensor_constant2 _tensor_constant2 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4645374Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4646217Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant3 target _tensor_constant3 _tensor_constant3 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4646896Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4647233Z ok (4.040s) 2023-01-11T22:44:49.4649581Z 2023-01-11T22:44:49.4649877Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4650218Z Ran 1 test in 4.041s 2023-01-11T22:44:49.4650381Z 2023-01-11T22:44:49.4650477Z OK 2023-01-11T22:44:49.4650609Z 2023-01-11T22:44:49.4650735Z Generating XML reports... 2023-01-11T22:44:49.4651277Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223320.xml 2023-01-11T22:44:49.4651958Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4652411Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4653198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4653683Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4654147Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzknmglkn 2023-01-11T22:44:49.4654797Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzknmglkn/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4655080Z 2023-01-11T22:44:49.4655191Z Running tests... 2023-01-11T22:44:49.4655612Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4656144Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4656685Z test_consecutive_comm_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4657180Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67389 2023-01-11T22:44:49.4657633Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67390 2023-01-11T22:44:49.4658242Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4658679Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4659261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4659731Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4660306Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4660741Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4661314Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4661783Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4662227Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc07j5e8o 2023-01-11T22:44:49.4662766Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc07j5e8o/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4663283Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4663779Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpewgz99mq 2023-01-11T22:44:49.4664295Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpewgz99mq/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4664803Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4665289Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4665781Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4666422Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4667116Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4668041Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4668764Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4669603Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4670312Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4671151Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant2 target _tensor_constant2 _tensor_constant2 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4671951Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4672775Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant3 target _tensor_constant3 _tensor_constant3 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4673531Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4674386Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4675097Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4675956Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4676643Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4677493Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant2 target _tensor_constant2 _tensor_constant2 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4678199Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4679043Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant3 target _tensor_constant3 _tensor_constant3 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4679746Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4680062Z ok (4.942s) 2023-01-11T22:44:49.4680212Z 2023-01-11T22:44:49.4680487Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4680821Z Ran 1 test in 4.942s 2023-01-11T22:44:49.4680981Z 2023-01-11T22:44:49.4681059Z OK 2023-01-11T22:44:49.4681197Z 2023-01-11T22:44:49.4681327Z Generating XML reports... 2023-01-11T22:44:49.4681887Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223327.xml 2023-01-11T22:44:49.4682563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4682997Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4683575Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4684051Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4684500Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl5ij8tfx 2023-01-11T22:44:49.4685043Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl5ij8tfx/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4685344Z 2023-01-11T22:44:49.4685453Z Running tests... 2023-01-11T22:44:49.4685863Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4686378Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4686871Z test_nested_comm_tensor_wrapping (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4687344Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67500 2023-01-11T22:44:49.4687774Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67501 2023-01-11T22:44:49.4688461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4688912Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4689486Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4689939Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4690570Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4691026Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4691588Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4692058Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4692528Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwp6vxwqj 2023-01-11T22:44:49.4693287Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwp6vxwqj/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4693807Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpup73b7op 2023-01-11T22:44:49.4694341Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpup73b7op/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4694850Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4695321Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4695791Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4696284Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4696951Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4697627Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4698552Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4699275Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4700122Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4700830Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4701661Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4702360Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4703210Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4703919Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4704230Z ok (4.028s) 2023-01-11T22:44:49.4704381Z 2023-01-11T22:44:49.4704657Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4704987Z Ran 1 test in 4.028s 2023-01-11T22:44:49.4705244Z 2023-01-11T22:44:49.4705342Z OK 2023-01-11T22:44:49.4705459Z 2023-01-11T22:44:49.4705588Z Generating XML reports... 2023-01-11T22:44:49.4706155Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223334.xml 2023-01-11T22:44:49.4706829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4707326Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4707922Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4708393Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4708859Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1v_y42b7 2023-01-11T22:44:49.4709378Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1v_y42b7/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4709683Z 2023-01-11T22:44:49.4709796Z Running tests... 2023-01-11T22:44:49.4710204Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4710717Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4711199Z test_scatter_work_wait_cpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4711666Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67609 2023-01-11T22:44:49.4712118Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67610 2023-01-11T22:44:49.4712705Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4713151Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4713724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4714194Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4714754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4715199Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4715769Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4716218Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4716684Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi2j8tub9 2023-01-11T22:44:49.4717223Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi2j8tub9/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4717748Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptb9w8z6x 2023-01-11T22:44:49.4718262Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptb9w8z6x/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4718776Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4719245Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4719714Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4720211Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4720866Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4721556Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4722459Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4723259Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4724191Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4724912Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4725242Z ok (4.025s) 2023-01-11T22:44:49.4725373Z 2023-01-11T22:44:49.4725643Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4725969Z Ran 1 test in 4.026s 2023-01-11T22:44:49.4726133Z 2023-01-11T22:44:49.4726229Z OK 2023-01-11T22:44:49.4726344Z 2023-01-11T22:44:49.4726473Z Generating XML reports... 2023-01-11T22:44:49.4727036Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223340.xml 2023-01-11T22:44:49.4727744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4728174Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4728752Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4729225Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4729692Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa21ccwfg 2023-01-11T22:44:49.4730217Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa21ccwfg/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4730520Z 2023-01-11T22:44:49.4730634Z Running tests... 2023-01-11T22:44:49.4731042Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4731580Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4732043Z test_scatter_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4732509Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67718 2023-01-11T22:44:49.4733334Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67719 2023-01-11T22:44:49.4733979Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4734433Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4735009Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4735482Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4736038Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4736486Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4737059Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4737507Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4737978Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuy6disg9 2023-01-11T22:44:49.4738520Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuy6disg9/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4739055Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu3n_e1op 2023-01-11T22:44:49.4739568Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu3n_e1op/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4740079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4740659Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4741145Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4741619Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4742351Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4743064Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4743969Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4744687Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4745538Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1346: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2023-01-11T22:44:49.4746249Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2023-01-11T22:44:49.4746583Z ok (4.844s) 2023-01-11T22:44:49.4746718Z 2023-01-11T22:44:49.4746987Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4747319Z Ran 1 test in 4.844s 2023-01-11T22:44:49.4747484Z 2023-01-11T22:44:49.4747579Z OK 2023-01-11T22:44:49.4747715Z 2023-01-11T22:44:49.4747822Z Generating XML reports... 2023-01-11T22:44:49.4748377Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223347.xml 2023-01-11T22:44:49.4749050Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4749504Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4750061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4750528Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4750996Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0lcem53a 2023-01-11T22:44:49.4751519Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0lcem53a/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4751818Z 2023-01-11T22:44:49.4751927Z Running tests... 2023-01-11T22:44:49.4752330Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4752856Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4753313Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4753922Z Dynamic module can be checkpointed, multiple times, with non-reentrant ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4754423Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67829 2023-01-11T22:44:49.4754854Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67830 2023-01-11T22:44:49.4755463Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4755916Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4756493Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4756943Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4757520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4758038Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4758600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4759067Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4814644Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4x7fp5dj 2023-01-11T22:44:49.4815287Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4x7fp5dj/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4815817Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb0nbo4wx 2023-01-11T22:44:49.4816336Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb0nbo4wx/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4816829Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4817279Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4817747Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4818238Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4818923Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4819601Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4819987Z ok (5.355s) 2023-01-11T22:44:49.4820129Z 2023-01-11T22:44:49.4820391Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4820696Z Ran 1 test in 5.355s 2023-01-11T22:44:49.4820850Z 2023-01-11T22:44:49.4820934Z OK 2023-01-11T22:44:49.4821061Z 2023-01-11T22:44:49.4821177Z Generating XML reports... 2023-01-11T22:44:49.4821782Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223354.xml 2023-01-11T22:44:49.4822488Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4822919Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4823487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4823941Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4824388Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp73kvnsic 2023-01-11T22:44:49.4824914Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp73kvnsic/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4825202Z 2023-01-11T22:44:49.4825305Z Running tests... 2023-01-11T22:44:49.4825696Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4826209Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4826678Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4827172Z Dynamic module can be checkpointed multiple times with weight sharing ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4827657Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67944 2023-01-11T22:44:49.4828123Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67945 2023-01-11T22:44:49.4828713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4829141Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4829710Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4830277Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4830839Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4831272Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4831881Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4832342Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4832785Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3zi4r04h 2023-01-11T22:44:49.4833306Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3zi4r04h/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4833819Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgr5ytb3e 2023-01-11T22:44:49.4834330Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgr5ytb3e/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4834835Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4835299Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4835772Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4836268Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4836920Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4837595Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4837972Z ok (5.338s) 2023-01-11T22:44:49.4838117Z 2023-01-11T22:44:49.4838372Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4838689Z Ran 1 test in 5.338s 2023-01-11T22:44:49.4838843Z 2023-01-11T22:44:49.4838932Z OK 2023-01-11T22:44:49.4839057Z 2023-01-11T22:44:49.4839175Z Generating XML reports... 2023-01-11T22:44:49.4839774Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223402.xml 2023-01-11T22:44:49.4840497Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4840935Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4841485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4841939Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4842384Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpizh7gu3s 2023-01-11T22:44:49.4842915Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpizh7gu3s/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4843198Z 2023-01-11T22:44:49.4843301Z Running tests... 2023-01-11T22:44:49.4843693Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4844208Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4844670Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4845160Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4845626Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68059 2023-01-11T22:44:49.4846071Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68060 2023-01-11T22:44:49.4846657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4847190Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4847758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4848206Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4848764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4849247Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4849820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4850264Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4850712Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp673zzszo 2023-01-11T22:44:49.4851234Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp673zzszo/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4851749Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd7sddgqm 2023-01-11T22:44:49.4852268Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd7sddgqm/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4852768Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4853466Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4853929Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4854406Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4855053Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4855723Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4856230Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4856692Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4857151Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4857625Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4858768Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:44:49.4859489Z warnings.warn( 2023-01-11T22:44:49.4860518Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:44:49.4861218Z warnings.warn( 2023-01-11T22:44:49.4861571Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4862043Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4862509Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4862971Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4863422Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4863985Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4864437Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4864879Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4865209Z ok (5.425s) 2023-01-11T22:44:49.4865349Z 2023-01-11T22:44:49.4865696Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4866035Z Ran 1 test in 5.425s 2023-01-11T22:44:49.4866180Z 2023-01-11T22:44:49.4866267Z OK 2023-01-11T22:44:49.4866396Z 2023-01-11T22:44:49.4866515Z Generating XML reports... 2023-01-11T22:44:49.4867129Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223409.xml 2023-01-11T22:44:49.4867833Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4868278Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4868843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4869295Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4869732Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbpkoe8go 2023-01-11T22:44:49.4870261Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbpkoe8go/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4870557Z 2023-01-11T22:44:49.4870665Z Running tests... 2023-01-11T22:44:49.4871061Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4871570Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4872034Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4872525Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4872976Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68174 2023-01-11T22:44:49.4873409Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68175 2023-01-11T22:44:49.4873997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4874438Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4874990Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4875444Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4876008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4876426Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4876987Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4877440Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4877890Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphdj864wo 2023-01-11T22:44:49.4878411Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphdj864wo/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4878901Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4879389Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkupvokr5 2023-01-11T22:44:49.4879899Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkupvokr5/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4880393Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4880937Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4881418Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4882054Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4882784Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4883311Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4883780Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4884238Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4884703Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4885859Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:44:49.4886559Z warnings.warn( 2023-01-11T22:44:49.4887586Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:44:49.4888293Z warnings.warn( 2023-01-11T22:44:49.4888655Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4889122Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4889579Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4890044Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4890502Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4890965Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4891413Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4891869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4892201Z ok (5.550s) 2023-01-11T22:44:49.4892341Z 2023-01-11T22:44:49.4892604Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4893132Z Ran 1 test in 5.550s 2023-01-11T22:44:49.4893295Z 2023-01-11T22:44:49.4893383Z OK 2023-01-11T22:44:49.4893510Z 2023-01-11T22:44:49.4893626Z Generating XML reports... 2023-01-11T22:44:49.4894225Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223417.xml 2023-01-11T22:44:49.4894938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4895382Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4895932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4896386Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4896838Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2bnetnyp 2023-01-11T22:44:49.4897364Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2bnetnyp/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4897753Z 2023-01-11T22:44:49.4897859Z Running tests... 2023-01-11T22:44:49.4898257Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4898775Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4899258Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4899942Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4900433Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68289 2023-01-11T22:44:49.4900872Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68290 2023-01-11T22:44:49.4901458Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4901899Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4902464Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4902909Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4903468Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4903901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4904457Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4904905Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4905356Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm5_ju5z9 2023-01-11T22:44:49.4905874Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm5_ju5z9/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4906371Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4906841Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf4m9zpjc 2023-01-11T22:44:49.4907103Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf4m9zpjc/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4907324Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4907562Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4907798Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4908199Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4908591Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4908814Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4909041Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4909265Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4909499Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4909597Z ok (5.321s) 2023-01-11T22:44:49.4909618Z 2023-01-11T22:44:49.4909879Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4909989Z Ran 1 test in 5.322s 2023-01-11T22:44:49.4910009Z 2023-01-11T22:44:49.4910095Z OK 2023-01-11T22:44:49.4910115Z 2023-01-11T22:44:49.4910234Z Generating XML reports... 2023-01-11T22:44:49.4910678Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223425.xml 2023-01-11T22:44:49.4911131Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4911302Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4911674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4911908Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4912162Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9_926y30 2023-01-11T22:44:49.4912418Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9_926y30/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4912438Z 2023-01-11T22:44:49.4912540Z Running tests... 2023-01-11T22:44:49.4912791Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4913097Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4913357Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4913695Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4913903Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68404 2023-01-11T22:44:49.4914119Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68405 2023-01-11T22:44:49.4914480Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4914646Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4915020Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4915193Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4915557Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4915733Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4916100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4916281Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4916527Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfq_27f9a 2023-01-11T22:44:49.4916786Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfq_27f9a/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4917029Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplip0y_en 2023-01-11T22:44:49.4917278Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplip0y_en/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4917500Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4917726Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4917965Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4918203Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4918601Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4918988Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4919214Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4919438Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4919723Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4919940Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4920036Z ok (5.429s) 2023-01-11T22:44:49.4920055Z 2023-01-11T22:44:49.4920327Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4920431Z Ran 1 test in 5.430s 2023-01-11T22:44:49.4920450Z 2023-01-11T22:44:49.4920539Z OK 2023-01-11T22:44:49.4920609Z 2023-01-11T22:44:49.4920737Z Generating XML reports... 2023-01-11T22:44:49.4921199Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223432.xml 2023-01-11T22:44:49.4921565Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4921725Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4922094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4922288Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4922532Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm3l76jon 2023-01-11T22:44:49.4922793Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm3l76jon/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4922813Z 2023-01-11T22:44:49.4922923Z Running tests... 2023-01-11T22:44:49.4923188Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4923494Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4923721Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4924092Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4924307Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68519 2023-01-11T22:44:49.4924518Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68520 2023-01-11T22:44:49.4924884Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4925049Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4925428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4925617Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4925972Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4926130Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4926496Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4926679Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4926925Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph1dliy6p 2023-01-11T22:44:49.4927187Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph1dliy6p/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4927438Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm7adi132 2023-01-11T22:44:49.4927697Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm7adi132/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4927917Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4928126Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4928395Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4928705Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4929109Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4929494Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4929773Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4930009Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4930782Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:44:49.4931565Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:44:49.4931795Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4932023Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4932124Z ok (5.443s) 2023-01-11T22:44:49.4932144Z 2023-01-11T22:44:49.4932412Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4932508Z Ran 1 test in 5.444s 2023-01-11T22:44:49.4932528Z 2023-01-11T22:44:49.4932615Z OK 2023-01-11T22:44:49.4932634Z 2023-01-11T22:44:49.4932750Z Generating XML reports... 2023-01-11T22:44:49.4933408Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223440.xml 2023-01-11T22:44:49.4933781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4933950Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4934318Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4934502Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4934745Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpclititaz 2023-01-11T22:44:49.4935014Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpclititaz/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4935034Z 2023-01-11T22:44:49.4935139Z Running tests... 2023-01-11T22:44:49.4935402Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4935712Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4935947Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4936315Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4936528Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68634 2023-01-11T22:44:49.4936740Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68635 2023-01-11T22:44:49.4937220Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4937394Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4937770Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4937956Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4938377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4938552Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4938921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4939104Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4939342Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxcfd5xsf 2023-01-11T22:44:49.4939615Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxcfd5xsf/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4939861Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0guf7vyg 2023-01-11T22:44:49.4940119Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0guf7vyg/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4940345Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4940564Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4940797Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4941033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4941427Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4941807Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4941899Z ok (5.336s) 2023-01-11T22:44:49.4941919Z 2023-01-11T22:44:49.4942172Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4942274Z Ran 1 test in 5.336s 2023-01-11T22:44:49.4942294Z 2023-01-11T22:44:49.4942376Z OK 2023-01-11T22:44:49.4942398Z 2023-01-11T22:44:49.4942518Z Generating XML reports... 2023-01-11T22:44:49.4942969Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223448.xml 2023-01-11T22:44:49.4943330Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4943489Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4943864Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4944057Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4944300Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt3ugb2h9 2023-01-11T22:44:49.4944558Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt3ugb2h9/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4944578Z 2023-01-11T22:44:49.4944685Z Running tests... 2023-01-11T22:44:49.4944945Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4945244Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4945471Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4945722Z Checkpointing should work with static graph in the case of checkpointing ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4945998Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68749 2023-01-11T22:44:49.4946203Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68750 2023-01-11T22:44:49.4946572Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4946737Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4947153Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4947347Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4947710Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4947867Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4948227Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4948414Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4948665Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6jqul1k0 2023-01-11T22:44:49.4948923Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6jqul1k0/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4949169Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdxes8zut 2023-01-11T22:44:49.4949428Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdxes8zut/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4949651Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4949868Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4950095Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4950333Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4950726Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4951116Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4951346Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4951575Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4951790Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4952015Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4952107Z ok (5.342s) 2023-01-11T22:44:49.4952126Z 2023-01-11T22:44:49.4952377Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4952487Z Ran 1 test in 5.343s 2023-01-11T22:44:49.4952506Z 2023-01-11T22:44:49.4952592Z OK 2023-01-11T22:44:49.4952611Z 2023-01-11T22:44:49.4952726Z Generating XML reports... 2023-01-11T22:44:49.4953181Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223456.xml 2023-01-11T22:44:49.4953551Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4953717Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4954087Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4954260Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4954506Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbdeu4esf 2023-01-11T22:44:49.4954836Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbdeu4esf/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4954856Z 2023-01-11T22:44:49.4954957Z Running tests... 2023-01-11T22:44:49.4955221Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4955519Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4955820Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4956088Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4956300Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68864 2023-01-11T22:44:49.4956500Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68865 2023-01-11T22:44:49.4956868Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4957041Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4957413Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4957592Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4957948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4958115Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4958479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4958651Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4958894Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp41xo0hbs 2023-01-11T22:44:49.4959149Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp41xo0hbs/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4959396Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr4yeldet 2023-01-11T22:44:49.4959657Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr4yeldet/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4959874Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4960090Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4960327Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4960558Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4960947Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4961342Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4962118Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:44:49.4962886Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:44:49.4963907Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:44:49.4964023Z warnings.warn( 2023-01-11T22:44:49.4964929Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:44:49.4965037Z warnings.warn( 2023-01-11T22:44:49.4965264Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4965490Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4965718Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4965940Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4966037Z ok (5.334s) 2023-01-11T22:44:49.4966057Z 2023-01-11T22:44:49.4966318Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4966414Z Ran 1 test in 5.334s 2023-01-11T22:44:49.4966434Z 2023-01-11T22:44:49.4966516Z OK 2023-01-11T22:44:49.4966535Z 2023-01-11T22:44:49.4966652Z Generating XML reports... 2023-01-11T22:44:49.4967105Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223503.xml 2023-01-11T22:44:49.4967471Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4967641Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4968011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4968199Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4968449Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy6fd67ds 2023-01-11T22:44:49.4968704Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy6fd67ds/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4968723Z 2023-01-11T22:44:49.4968822Z Running tests... 2023-01-11T22:44:49.4969085Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4969392Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4969643Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4969903Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4970115Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68979 2023-01-11T22:44:49.4970325Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68980 2023-01-11T22:44:49.4970681Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4970851Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4971220Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4971400Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4971831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4971995Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4972359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4972589Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4972848Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi3uka9we 2023-01-11T22:44:49.4973299Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi3uka9we/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4973545Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6xuddl4o 2023-01-11T22:44:49.4973810Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6xuddl4o/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4974037Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4974258Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4974498Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4974736Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4975148Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4975525Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4976429Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:44:49.4976544Z warnings.warn( 2023-01-11T22:44:49.4977450Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:44:49.4977557Z warnings.warn( 2023-01-11T22:44:49.4977785Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4978005Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4978233Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4978465Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4978560Z ok (5.338s) 2023-01-11T22:44:49.4978581Z 2023-01-11T22:44:49.4978848Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4978947Z Ran 1 test in 5.339s 2023-01-11T22:44:49.4978966Z 2023-01-11T22:44:49.4979052Z OK 2023-01-11T22:44:49.4979071Z 2023-01-11T22:44:49.4979196Z Generating XML reports... 2023-01-11T22:44:49.4979651Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223511.xml 2023-01-11T22:44:49.4980017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4980186Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4980559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4980870Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4981111Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjdz6jb_x 2023-01-11T22:44:49.4981378Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjdz6jb_x/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4981399Z 2023-01-11T22:44:49.4981501Z Running tests... 2023-01-11T22:44:49.4981834Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4982163Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4982422Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4982656Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4982870Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69094 2023-01-11T22:44:49.4983084Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69095 2023-01-11T22:44:49.4983439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4983609Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4983984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4984169Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4984528Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4984692Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4985059Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4985242Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4985481Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp27dhec3_ 2023-01-11T22:44:49.4985744Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp27dhec3_/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4985966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4986213Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjqk7in9x 2023-01-11T22:44:49.4986474Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjqk7in9x/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4986692Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4986931Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4987167Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4987566Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4987943Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4988170Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4988396Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4988618Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4988838Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4988931Z ok (5.309s) 2023-01-11T22:44:49.4988950Z 2023-01-11T22:44:49.4989207Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4989376Z Ran 1 test in 5.310s 2023-01-11T22:44:49.4989395Z 2023-01-11T22:44:49.4989472Z OK 2023-01-11T22:44:49.4989497Z 2023-01-11T22:44:49.4989606Z Generating XML reports... 2023-01-11T22:44:49.4990063Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223519.xml 2023-01-11T22:44:49.4990424Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4990648Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4991033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4991214Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4991463Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplgkbjeer 2023-01-11T22:44:49.4991724Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplgkbjeer/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4991749Z 2023-01-11T22:44:49.4991843Z Running tests... 2023-01-11T22:44:49.4992109Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.4992417Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.4992671Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.4992900Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.4993107Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69209 2023-01-11T22:44:49.4993312Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69210 2023-01-11T22:44:49.4993671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4993831Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4994204Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4994385Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4994739Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.4994905Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.4995273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.4995452Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.4995699Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps4zbrm_q 2023-01-11T22:44:49.4995960Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps4zbrm_q/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4996198Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1e_71xoe 2023-01-11T22:44:49.4996451Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1e_71xoe/_remote_module_non_scriptable.py 2023-01-11T22:44:49.4996669Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.4996890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.4997132Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.4997363Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.4997757Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4998140Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.4998452Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4998670Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4998895Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4999116Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4999384Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4999617Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.4999833Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5000046Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5000141Z ok (5.333s) 2023-01-11T22:44:49.5000164Z 2023-01-11T22:44:49.5000425Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5000529Z Ran 1 test in 5.333s 2023-01-11T22:44:49.5000548Z 2023-01-11T22:44:49.5000631Z OK 2023-01-11T22:44:49.5000650Z 2023-01-11T22:44:49.5000765Z Generating XML reports... 2023-01-11T22:44:49.5001220Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223526.xml 2023-01-11T22:44:49.5001588Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5001754Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5002127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5002310Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5002549Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0cwnc6mo 2023-01-11T22:44:49.5002810Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0cwnc6mo/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5002830Z 2023-01-11T22:44:49.5002934Z Running tests... 2023-01-11T22:44:49.5003192Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5003494Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5003717Z test_ddp_comm_hook_future_passing_cpu (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.5003979Z This unit test verifies whether the Future object is passed properly. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5004188Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69324 2023-01-11T22:44:49.5004388Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69325 2023-01-11T22:44:49.5004754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5004931Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5005300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5005484Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5005840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5006005Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5006374Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5006551Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5006791Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8dqjd_ry 2023-01-11T22:44:49.5007116Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8dqjd_ry/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5007338Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5007581Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp35b2l84r 2023-01-11T22:44:49.5007838Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp35b2l84r/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5008095Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5008336Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5008569Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5008966Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5009358Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5009451Z ok (3.906s) 2023-01-11T22:44:49.5009472Z 2023-01-11T22:44:49.5009730Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5009836Z Ran 1 test in 3.906s 2023-01-11T22:44:49.5009855Z 2023-01-11T22:44:49.5009937Z OK 2023-01-11T22:44:49.5009956Z 2023-01-11T22:44:49.5010075Z Generating XML reports... 2023-01-11T22:44:49.5010524Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223534.xml 2023-01-11T22:44:49.5010888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5011048Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5011419Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5011606Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5011851Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfyapmrtx 2023-01-11T22:44:49.5012115Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfyapmrtx/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5012135Z 2023-01-11T22:44:49.5012239Z Running tests... 2023-01-11T22:44:49.5012499Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5012804Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5013231Z test_ddp_comm_hook_future_passing_gpu_gloo (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.5013526Z This unit test verifies whether the Future object is passed properly using gloo backend. ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5013736Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69437 2023-01-11T22:44:49.5013948Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69438 2023-01-11T22:44:49.5014316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5014483Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5014858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5015040Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5015401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5015557Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5015923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5016196Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5016447Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo1ek4p4e 2023-01-11T22:44:49.5016709Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo1ek4p4e/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5016949Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5ivy425c 2023-01-11T22:44:49.5017263Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5ivy425c/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5017492Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5017714Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5017942Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5018174Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5018583Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5018970Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5019061Z ok (4.818s) 2023-01-11T22:44:49.5019081Z 2023-01-11T22:44:49.5019340Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5019448Z Ran 1 test in 4.818s 2023-01-11T22:44:49.5019467Z 2023-01-11T22:44:49.5019551Z OK 2023-01-11T22:44:49.5019569Z 2023-01-11T22:44:49.5019677Z Generating XML reports... 2023-01-11T22:44:49.5020132Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223540.xml 2023-01-11T22:44:49.5020494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5020664Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5021037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5021220Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5021463Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_yi6qgi8 2023-01-11T22:44:49.5021726Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_yi6qgi8/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5021745Z 2023-01-11T22:44:49.5021848Z Running tests... 2023-01-11T22:44:49.5022096Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5022398Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5022613Z test_ddp_comm_hook_register_just_once (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.5022891Z DDP communication hook can only be registered once. This test validates whether ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5023097Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69552 2023-01-11T22:44:49.5023305Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69553 2023-01-11T22:44:49.5023666Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5023836Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5024198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5024380Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5024731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5024965Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5025338Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5025517Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5025763Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjft3datr 2023-01-11T22:44:49.5026079Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjft3datr/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5026329Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqnitw8cx 2023-01-11T22:44:49.5026580Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqnitw8cx/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5026804Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5027023Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5027262Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5027498Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5027901Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5028297Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5028392Z ok (3.966s) 2023-01-11T22:44:49.5028412Z 2023-01-11T22:44:49.5028671Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5028800Z Ran 1 test in 3.966s 2023-01-11T22:44:49.5028820Z 2023-01-11T22:44:49.5028904Z OK 2023-01-11T22:44:49.5028923Z 2023-01-11T22:44:49.5029037Z Generating XML reports... 2023-01-11T22:44:49.5029487Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223547.xml 2023-01-11T22:44:49.5029851Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5030023Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5030393Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5030579Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5030821Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv2ryp153 2023-01-11T22:44:49.5031084Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv2ryp153/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5031104Z 2023-01-11T22:44:49.5031205Z Running tests... 2023-01-11T22:44:49.5031467Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5031775Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5031992Z test_ddp_comm_hook_sparse_gradients (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.5032259Z Runs "test_sparse_gradients" unit test with DDP communication hook. We define a ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5032471Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69661 2023-01-11T22:44:49.5032681Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69662 2023-01-11T22:44:49.5033035Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5033203Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5033571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5033827Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5034195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5034361Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5034724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5034966Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5035215Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoox817ma 2023-01-11T22:44:49.5035478Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoox817ma/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5035697Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5035941Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5peqgj4r 2023-01-11T22:44:49.5036212Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5peqgj4r/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5036429Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5036663Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5036899Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5037300Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5037676Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5037769Z ok (3.933s) 2023-01-11T22:44:49.5037788Z 2023-01-11T22:44:49.5038051Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5038163Z Ran 1 test in 3.933s 2023-01-11T22:44:49.5038182Z 2023-01-11T22:44:49.5038264Z OK 2023-01-11T22:44:49.5038283Z 2023-01-11T22:44:49.5038398Z Generating XML reports... 2023-01-11T22:44:49.5038853Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223553.xml 2023-01-11T22:44:49.5039214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5039384Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5039747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5039932Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5040175Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3leq76m3 2023-01-11T22:44:49.5040437Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3leq76m3/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5040460Z 2023-01-11T22:44:49.5040562Z Running tests... 2023-01-11T22:44:49.5040822Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5041124Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5041333Z test_ddp_invalid_comm_hook_init (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.5041593Z This unit test makes sure that register_comm_hook properly checks the format ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5041802Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69804 2023-01-11T22:44:49.5042008Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69805 2023-01-11T22:44:49.5042370Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5042535Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5042981Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5043162Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5043520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5043732Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5044100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5044285Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5044533Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm785dw0v 2023-01-11T22:44:49.5044796Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm785dw0v/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5045043Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaubhqtu5 2023-01-11T22:44:49.5045303Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaubhqtu5/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5045520Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5045739Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5045971Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5046211Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5046612Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5046997Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5047101Z ok (4.028s) 2023-01-11T22:44:49.5047121Z 2023-01-11T22:44:49.5047375Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5047482Z Ran 1 test in 4.028s 2023-01-11T22:44:49.5047501Z 2023-01-11T22:44:49.5047587Z OK 2023-01-11T22:44:49.5047606Z 2023-01-11T22:44:49.5047719Z Generating XML reports... 2023-01-11T22:44:49.5048162Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223600.xml 2023-01-11T22:44:49.5048526Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5048695Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5049064Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5049251Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5049501Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvfdwnpkk 2023-01-11T22:44:49.5049766Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvfdwnpkk/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5049785Z 2023-01-11T22:44:49.5049887Z Running tests... 2023-01-11T22:44:49.5050135Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5050443Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5050661Z test_ddp_invalid_comm_hook_return_type (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.5050931Z This test checks whether return annotation checked properly if defined. It also ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5051139Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69913 2023-01-11T22:44:49.5051352Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69914 2023-01-11T22:44:49.5051790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5051957Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5052327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5052549Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5053115Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5053291Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5053664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5053841Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5054096Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplfg9b2dc 2023-01-11T22:44:49.5054357Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplfg9b2dc/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5054601Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy5376syk 2023-01-11T22:44:49.5054849Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy5376syk/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5055072Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5055297Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5055532Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5055767Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5056167Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5056557Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5056653Z ok (4.027s) 2023-01-11T22:44:49.5056673Z 2023-01-11T22:44:49.5056927Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5057022Z Ran 1 test in 4.027s 2023-01-11T22:44:49.5057054Z 2023-01-11T22:44:49.5057132Z OK 2023-01-11T22:44:49.5057150Z 2023-01-11T22:44:49.5057263Z Generating XML reports... 2023-01-11T22:44:49.5057716Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223606.xml 2023-01-11T22:44:49.5058078Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5058245Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5058621Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5058801Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5059047Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk1konsjw 2023-01-11T22:44:49.5059299Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk1konsjw/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5059322Z 2023-01-11T22:44:49.5059423Z Running tests... 2023-01-11T22:44:49.5059677Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5059982Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5060225Z test_find_unused_parameters_when_unused_parameters_empty (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.5060483Z An empty unused_parameters array does not imply find_unused_parameters = ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5060809Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70026 2023-01-11T22:44:49.5061019Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70027 2023-01-11T22:44:49.5061382Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5061550Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5061982Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5062181Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5062544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5062706Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5063073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5063255Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5063501Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp07o8udu8 2023-01-11T22:44:49.5063752Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp07o8udu8/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5064036Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5qcb4fjh 2023-01-11T22:44:49.5064299Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5qcb4fjh/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5064518Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5064736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5064972Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5065209Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5065607Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5065992Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5066769Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:44:49.5066863Z ok (4.847s) 2023-01-11T22:44:49.5066893Z 2023-01-11T22:44:49.5067142Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5067246Z Ran 1 test in 4.848s 2023-01-11T22:44:49.5067265Z 2023-01-11T22:44:49.5067349Z OK 2023-01-11T22:44:49.5067368Z 2023-01-11T22:44:49.5067481Z Generating XML reports... 2023-01-11T22:44:49.5067942Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223612.xml 2023-01-11T22:44:49.5068309Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5068478Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5068849Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5069022Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5069342Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps_e7j6lu 2023-01-11T22:44:49.5069602Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps_e7j6lu/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5069622Z 2023-01-11T22:44:49.5069721Z Running tests... 2023-01-11T22:44:49.5069985Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5070339Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5070622Z test_global_local_unused_params_grad (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5070834Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70141 2023-01-11T22:44:49.5071035Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70142 2023-01-11T22:44:49.5071403Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5071576Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5071945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5072128Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5072488Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5072654Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5073017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5073199Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5073435Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_q2ms82s 2023-01-11T22:44:49.5073697Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_q2ms82s/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5073912Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5074160Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpisy10lwo 2023-01-11T22:44:49.5074419Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpisy10lwo/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5074640Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5074875Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5075110Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5075506Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5075893Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5075990Z ok (4.958s) 2023-01-11T22:44:49.5076010Z 2023-01-11T22:44:49.5076270Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5076376Z Ran 1 test in 4.958s 2023-01-11T22:44:49.5076396Z 2023-01-11T22:44:49.5076482Z OK 2023-01-11T22:44:49.5076500Z 2023-01-11T22:44:49.5076623Z Generating XML reports... 2023-01-11T22:44:49.5077078Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223620.xml 2023-01-11T22:44:49.5077440Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5077600Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5077973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5078228Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5078482Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmkrfpxnk 2023-01-11T22:44:49.5078748Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmkrfpxnk/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5078768Z 2023-01-11T22:44:49.5078870Z Running tests... 2023-01-11T22:44:49.5079186Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5079508Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5079811Z test_global_local_unused_params_grad_with_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5080015Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70256 2023-01-11T22:44:49.5080224Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70257 2023-01-11T22:44:49.5080594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5080761Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5081135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5081327Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5081686Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5081850Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5082217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5082388Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5082641Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppkmc4bc5 2023-01-11T22:44:49.5082903Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppkmc4bc5/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5083149Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd4k3wvq0 2023-01-11T22:44:49.5083413Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd4k3wvq0/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5083641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5083865Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5084100Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5084326Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5084726Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5085111Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5085205Z ok (4.828s) 2023-01-11T22:44:49.5085224Z 2023-01-11T22:44:49.5085481Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5085591Z Ran 1 test in 4.828s 2023-01-11T22:44:49.5085614Z 2023-01-11T22:44:49.5085699Z OK 2023-01-11T22:44:49.5085719Z 2023-01-11T22:44:49.5085834Z Generating XML reports... 2023-01-11T22:44:49.5086290Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223627.xml 2023-01-11T22:44:49.5086642Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5086808Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5087260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5087444Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5087694Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn2xq9v6z 2023-01-11T22:44:49.5087999Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn2xq9v6z/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5088020Z 2023-01-11T22:44:49.5088132Z Running tests... 2023-01-11T22:44:49.5088401Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5088695Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5089000Z test_global_local_unused_params_grad_with_static_graph (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5089213Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70371 2023-01-11T22:44:49.5089433Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70372 2023-01-11T22:44:49.5089801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5089972Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5090350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5090536Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5090890Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5091046Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5091407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5091591Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5091837Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp18yn7h9f 2023-01-11T22:44:49.5092100Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp18yn7h9f/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5092346Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpltim_imo 2023-01-11T22:44:49.5092607Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpltim_imo/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5092828Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5093246Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5093480Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5093717Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5094123Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5094511Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5095418Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:44:49.5095531Z warnings.warn( 2023-01-11T22:44:49.5096432Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1911: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2023-01-11T22:44:49.5096631Z warnings.warn( 2023-01-11T22:44:49.5096728Z ok (4.843s) 2023-01-11T22:44:49.5096748Z 2023-01-11T22:44:49.5097019Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5097175Z Ran 1 test in 4.843s 2023-01-11T22:44:49.5097197Z 2023-01-11T22:44:49.5097297Z OK 2023-01-11T22:44:49.5097316Z 2023-01-11T22:44:49.5097439Z Generating XML reports... 2023-01-11T22:44:49.5097903Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223634.xml 2023-01-11T22:44:49.5098271Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5098449Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5098822Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5099004Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5099251Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqyvuq52d 2023-01-11T22:44:49.5099508Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqyvuq52d/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5099528Z 2023-01-11T22:44:49.5099631Z Running tests... 2023-01-11T22:44:49.5099887Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5100193Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5100490Z test_gloo_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5100705Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70486 2023-01-11T22:44:49.5100913Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70487 2023-01-11T22:44:49.5101276Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5101433Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5101807Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5101992Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5102357Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5102523Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5102893Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5103078Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5103332Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpovfchepo 2023-01-11T22:44:49.5103596Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpovfchepo/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5103836Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcofm7wwu 2023-01-11T22:44:49.5104098Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcofm7wwu/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5104320Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5104542Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5104780Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5105084Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5105484Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5105875Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5106158Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5106386Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5106486Z ok (5.332s) 2023-01-11T22:44:49.5106506Z 2023-01-11T22:44:49.5106774Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5106884Z Ran 1 test in 5.332s 2023-01-11T22:44:49.5106904Z 2023-01-11T22:44:49.5106990Z OK 2023-01-11T22:44:49.5107009Z 2023-01-11T22:44:49.5107128Z Generating XML reports... 2023-01-11T22:44:49.5107580Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223641.xml 2023-01-11T22:44:49.5107947Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5108106Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5108481Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5108668Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5108916Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfnepq0lm 2023-01-11T22:44:49.5109180Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfnepq0lm/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5109200Z 2023-01-11T22:44:49.5109306Z Running tests... 2023-01-11T22:44:49.5109570Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5109876Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5110182Z test_gloo_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5110384Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70603 2023-01-11T22:44:49.5110596Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70604 2023-01-11T22:44:49.5110962Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5111133Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5111508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5111691Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5112052Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5112216Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5112577Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5112764Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5113018Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm_9j8lc9 2023-01-11T22:44:49.5113265Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4x599hvl 2023-01-11T22:44:49.5113525Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm_9j8lc9/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5113780Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4x599hvl/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5114077Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5114300Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5114536Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5114838Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5115255Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5115646Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5115877Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5116104Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5116207Z ok (5.324s) 2023-01-11T22:44:49.5116227Z 2023-01-11T22:44:49.5116490Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5116601Z Ran 1 test in 5.324s 2023-01-11T22:44:49.5116621Z 2023-01-11T22:44:49.5116697Z OK 2023-01-11T22:44:49.5116729Z 2023-01-11T22:44:49.5116836Z Generating XML reports... 2023-01-11T22:44:49.5117295Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223649.xml 2023-01-11T22:44:49.5117664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5117834Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5118208Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5118393Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5118649Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5ga21yr_ 2023-01-11T22:44:49.5118911Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5ga21yr_/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5118931Z 2023-01-11T22:44:49.5119022Z Running tests... 2023-01-11T22:44:49.5119284Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5119595Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5119864Z test_gloo_backend_2gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5120081Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70720 2023-01-11T22:44:49.5120291Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70721 2023-01-11T22:44:49.5120659Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5120835Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5121209Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5121383Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5121742Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5121910Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5122282Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5122463Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5122712Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt0umd6uo 2023-01-11T22:44:49.5123055Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt0umd6uo/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5123298Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd6w3r4dd 2023-01-11T22:44:49.5123547Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd6w3r4dd/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5123771Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5124038Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5124189Z skip: Need at least 4 CUDA devices (3.935s) 2023-01-11T22:44:49.5124209Z 2023-01-11T22:44:49.5124481Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5124588Z Ran 1 test in 3.935s 2023-01-11T22:44:49.5124607Z 2023-01-11T22:44:49.5124709Z OK (skipped=1) 2023-01-11T22:44:49.5124728Z 2023-01-11T22:44:49.5124845Z Generating XML reports... 2023-01-11T22:44:49.5125305Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223656.xml 2023-01-11T22:44:49.5125661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5125834Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5126211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5126397Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5126643Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp79gqbqru 2023-01-11T22:44:49.5126904Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp79gqbqru/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5126923Z 2023-01-11T22:44:49.5127028Z Running tests... 2023-01-11T22:44:49.5127289Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5127587Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5127859Z test_gloo_backend_4gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5128069Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70823 2023-01-11T22:44:49.5128282Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70824 2023-01-11T22:44:49.5128648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5128814Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5129186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5129398Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5129761Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5129916Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5130282Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5130464Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5130714Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpigw5iee0 2023-01-11T22:44:49.5130981Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpigw5iee0/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5131225Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpefmngoro 2023-01-11T22:44:49.5131487Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpefmngoro/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5131710Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5131990Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5132135Z skip: Need at least 8 CUDA devices (3.930s) 2023-01-11T22:44:49.5132155Z 2023-01-11T22:44:49.5132428Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5132533Z Ran 1 test in 3.930s 2023-01-11T22:44:49.5132552Z 2023-01-11T22:44:49.5132703Z OK (skipped=1) 2023-01-11T22:44:49.5132724Z 2023-01-11T22:44:49.5132850Z Generating XML reports... 2023-01-11T22:44:49.5133511Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223703.xml 2023-01-11T22:44:49.5133875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5134044Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5134407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5134591Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5134839Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2bfrzcmr 2023-01-11T22:44:49.5135101Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2bfrzcmr/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5135121Z 2023-01-11T22:44:49.5135226Z Running tests... 2023-01-11T22:44:49.5135486Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5135785Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5136058Z test_gloo_backend_cpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5136260Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70926 2023-01-11T22:44:49.5136476Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70927 2023-01-11T22:44:49.5136840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5137007Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5137375Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5137562Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5137923Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5138090Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5138455Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5138624Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5138876Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsqbrskmr 2023-01-11T22:44:49.5139141Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsqbrskmr/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5139390Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkvavx4ru 2023-01-11T22:44:49.5139655Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkvavx4ru/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5139877Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5140097Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5140334Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5140569Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5141057Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5141443Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5141673Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5141961Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5142069Z ok (3.942s) 2023-01-11T22:44:49.5142088Z 2023-01-11T22:44:49.5142357Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5142463Z Ran 1 test in 3.942s 2023-01-11T22:44:49.5142483Z 2023-01-11T22:44:49.5142571Z OK 2023-01-11T22:44:49.5142590Z 2023-01-11T22:44:49.5142696Z Generating XML reports... 2023-01-11T22:44:49.5143147Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223709.xml 2023-01-11T22:44:49.5143522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5143693Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5144066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5144252Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5144496Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa9kv_2vz 2023-01-11T22:44:49.5144758Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa9kv_2vz/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5144777Z 2023-01-11T22:44:49.5144877Z Running tests... 2023-01-11T22:44:49.5145126Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5145429Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5145711Z test_gloo_backend_cpu_module_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5145923Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71041 2023-01-11T22:44:49.5146134Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71042 2023-01-11T22:44:49.5146498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5146667Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5147039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5147212Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5147568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5147738Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5148103Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5148280Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5148526Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4_2dh5u7 2023-01-11T22:44:49.5148787Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4_2dh5u7/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5149031Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpze_qh7te 2023-01-11T22:44:49.5149290Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpze_qh7te/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5149503Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5149792Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5150031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5150266Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5150720Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5151121Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5151351Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5151582Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5151677Z ok (3.937s) 2023-01-11T22:44:49.5151697Z 2023-01-11T22:44:49.5151952Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5152060Z Ran 1 test in 3.937s 2023-01-11T22:44:49.5152079Z 2023-01-11T22:44:49.5152164Z OK 2023-01-11T22:44:49.5152184Z 2023-01-11T22:44:49.5152298Z Generating XML reports... 2023-01-11T22:44:49.5152750Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223715.xml 2023-01-11T22:44:49.5153117Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5153287Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5153664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5153838Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5154083Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkmrunggm 2023-01-11T22:44:49.5154350Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkmrunggm/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5154370Z 2023-01-11T22:44:49.5154471Z Running tests... 2023-01-11T22:44:49.5154728Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5155028Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5155225Z test_ignored_output (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.5155472Z Test that the output of a model can be ignored and that there is no ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5155681Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71156 2023-01-11T22:44:49.5155880Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71157 2023-01-11T22:44:49.5156244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5156420Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5156791Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5156980Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5157340Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5157510Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5157880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5158049Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5158300Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn_8ou2qv 2023-01-11T22:44:49.5158565Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn_8ou2qv/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5158888Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaau89tlf 2023-01-11T22:44:49.5159148Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaau89tlf/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5159371Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5159626Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5159874Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5160112Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5160504Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5160897Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5161133Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5161358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5161453Z ok (3.940s) 2023-01-11T22:44:49.5161473Z 2023-01-11T22:44:49.5161734Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5161843Z Ran 1 test in 3.940s 2023-01-11T22:44:49.5161863Z 2023-01-11T22:44:49.5161947Z OK 2023-01-11T22:44:49.5161967Z 2023-01-11T22:44:49.5162073Z Generating XML reports... 2023-01-11T22:44:49.5162528Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223721.xml 2023-01-11T22:44:49.5162897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5163074Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5163448Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5163633Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5163882Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsb6qqnm_ 2023-01-11T22:44:49.5164147Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsb6qqnm_/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5164167Z 2023-01-11T22:44:49.5164266Z Running tests... 2023-01-11T22:44:49.5164516Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5164818Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5165050Z test_ignored_output_with_unused_parameters (__main__.DistributedDataParallelTest) 2023-01-11T22:44:49.5165298Z Test that the output of a model can be ignored and that there is no ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5165516Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71299 2023-01-11T22:44:49.5165728Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71300 2023-01-11T22:44:49.5166100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5166272Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5166633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5166818Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5167178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5167348Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5167795Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5167982Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5168231Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq_5byo2x 2023-01-11T22:44:49.5168541Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq_5byo2x/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5168801Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph6h128q5 2023-01-11T22:44:49.5169051Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph6h128q5/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5169302Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5169518Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5169767Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5170006Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5170415Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5170810Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5170906Z ok (3.918s) 2023-01-11T22:44:49.5170926Z 2023-01-11T22:44:49.5171189Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5171284Z Ran 1 test in 3.918s 2023-01-11T22:44:49.5171304Z 2023-01-11T22:44:49.5171393Z OK 2023-01-11T22:44:49.5171412Z 2023-01-11T22:44:49.5171538Z Generating XML reports... 2023-01-11T22:44:49.5171993Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223727.xml 2023-01-11T22:44:49.5172362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5172534Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5173096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5173298Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5173542Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzgh9hpvj 2023-01-11T22:44:49.5173809Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzgh9hpvj/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5173829Z 2023-01-11T22:44:49.5173932Z Running tests... 2023-01-11T22:44:49.5174198Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5174506Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5174779Z test_ignored_sharded_tensor (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5174991Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71442 2023-01-11T22:44:49.5175206Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71443 2023-01-11T22:44:49.5175577Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5175736Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5176106Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5176294Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5176653Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5176915Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5177292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5177474Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5177727Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxm0n7yil 2023-01-11T22:44:49.5178045Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxm0n7yil/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5178305Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn7uyqdh0 2023-01-11T22:44:49.5178574Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn7uyqdh0/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5178796Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5179021Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5179260Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5179496Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5179902Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5180292Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5180379Z ok (4.838s) 2023-01-11T22:44:49.5180399Z 2023-01-11T22:44:49.5180661Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5180766Z Ran 1 test in 4.838s 2023-01-11T22:44:49.5180786Z 2023-01-11T22:44:49.5180874Z OK 2023-01-11T22:44:49.5180893Z 2023-01-11T22:44:49.5181011Z Generating XML reports... 2023-01-11T22:44:49.5181474Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223734.xml 2023-01-11T22:44:49.5181834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5182003Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5182377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5182552Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5182802Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1hj35h1c 2023-01-11T22:44:49.5183065Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1hj35h1c/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5183085Z 2023-01-11T22:44:49.5183187Z Running tests... 2023-01-11T22:44:49.5183450Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5183759Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5184029Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5184243Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71553 2023-01-11T22:44:49.5184446Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71554 2023-01-11T22:44:49.5184811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5184978Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5185350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5185533Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5185981Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5186149Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5186524Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5186708Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5186999Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmposfsn1w4 2023-01-11T22:44:49.5187273Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmposfsn1w4/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5187520Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv9yagdpv 2023-01-11T22:44:49.5187785Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv9yagdpv/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5188013Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5188557Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:44:49.5189098Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:44:49.5189635Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:44:49.5190170Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:44:49.5190704Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:44:49.5191229Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:44:49.5191457Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5191984Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:44:49.5192503Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:44:49.5193101Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:44:49.5193673Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:44:49.5194212Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:44:49.5194741Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2023-01-11T22:44:49.5194840Z ok (3.925s) 2023-01-11T22:44:49.5194860Z 2023-01-11T22:44:49.5195136Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5195233Z Ran 1 test in 3.925s 2023-01-11T22:44:49.5195263Z 2023-01-11T22:44:49.5195339Z OK 2023-01-11T22:44:49.5195358Z 2023-01-11T22:44:49.5195475Z Generating XML reports... 2023-01-11T22:44:49.5195933Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223741.xml 2023-01-11T22:44:49.5196309Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5196483Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5196861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5197049Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5197302Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbgxa7d0q 2023-01-11T22:44:49.5197556Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbgxa7d0q/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5197576Z 2023-01-11T22:44:49.5197678Z Running tests... 2023-01-11T22:44:49.5197946Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5198253Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5198526Z test_save_load_checkpoint (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5198740Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71656 2023-01-11T22:44:49.5198949Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71657 2023-01-11T22:44:49.5199317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5199476Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5199851Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5200037Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5200401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5200635Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5201008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5201190Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5201442Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwcd4d844 2023-01-11T22:44:49.5201753Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwcd4d844/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5201997Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3mgtult8 2023-01-11T22:44:49.5202256Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3mgtult8/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5202478Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5202698Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5202951Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5203188Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5203595Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5203987Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5204220Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5204439Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5204669Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5204887Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5204989Z ok (5.329s) 2023-01-11T22:44:49.5205010Z 2023-01-11T22:44:49.5205276Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5205385Z Ran 1 test in 5.330s 2023-01-11T22:44:49.5205405Z 2023-01-11T22:44:49.5205496Z OK 2023-01-11T22:44:49.5205515Z 2023-01-11T22:44:49.5205635Z Generating XML reports... 2023-01-11T22:44:49.5206084Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223747.xml 2023-01-11T22:44:49.5206454Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5206627Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5207001Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5207191Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5207444Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgwrhfvqu 2023-01-11T22:44:49.5207708Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgwrhfvqu/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5207728Z 2023-01-11T22:44:49.5207831Z Running tests... 2023-01-11T22:44:49.5208102Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5208403Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5208669Z test_sparse_gradients (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5208885Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71771 2023-01-11T22:44:49.5209098Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71772 2023-01-11T22:44:49.5209460Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5209698Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5210082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5210270Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5210665Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5210843Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5211217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5211401Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5211655Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp16thjdmp 2023-01-11T22:44:49.5211927Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp16thjdmp/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5212174Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp99vb6vcv 2023-01-11T22:44:49.5212433Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp99vb6vcv/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5212659Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5213049Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5213305Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5213547Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5213954Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5214353Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5214454Z ok (3.942s) 2023-01-11T22:44:49.5214473Z 2023-01-11T22:44:49.5214731Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5214839Z Ran 1 test in 3.942s 2023-01-11T22:44:49.5214858Z 2023-01-11T22:44:49.5214935Z OK 2023-01-11T22:44:49.5214966Z 2023-01-11T22:44:49.5215076Z Generating XML reports... 2023-01-11T22:44:49.5215533Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223755.xml 2023-01-11T22:44:49.5215907Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5216079Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5216454Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5216645Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5216895Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzn9we_rl 2023-01-11T22:44:49.5217158Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzn9we_rl/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5217178Z 2023-01-11T22:44:49.5217271Z Running tests... 2023-01-11T22:44:49.5217535Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5217847Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5218126Z test_sparse_gradients_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5218340Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71914 2023-01-11T22:44:49.5218556Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71915 2023-01-11T22:44:49.5219020Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5219193Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5219563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5219800Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5220182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5220351Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5220716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5220898Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5221154Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8toui57d 2023-01-11T22:44:49.5221400Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8m3hjv9x 2023-01-11T22:44:49.5221662Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8toui57d/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5221905Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8m3hjv9x/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5222134Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5222353Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5222593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5222829Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5223235Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5223628Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5223727Z ok (3.936s) 2023-01-11T22:44:49.5223747Z 2023-01-11T22:44:49.5224010Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5224105Z Ran 1 test in 3.936s 2023-01-11T22:44:49.5224128Z 2023-01-11T22:44:49.5224218Z OK 2023-01-11T22:44:49.5224238Z 2023-01-11T22:44:49.5224359Z Generating XML reports... 2023-01-11T22:44:49.5224818Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223801.xml 2023-01-11T22:44:49.5225184Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5225356Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5225737Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5225924Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5226163Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqyp4f3kh 2023-01-11T22:44:49.5226431Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqyp4f3kh/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5226451Z 2023-01-11T22:44:49.5226559Z Running tests... 2023-01-11T22:44:49.5226825Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5227131Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5227405Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5227619Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72057 2023-01-11T22:44:49.5227905Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72058 2023-01-11T22:44:49.5228280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5228438Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5228858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5229053Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5229421Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5229591Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5229989Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5230179Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5230434Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphieg7bec 2023-01-11T22:44:49.5230688Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphieg7bec/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5230935Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpol9ri9bv 2023-01-11T22:44:49.5231204Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpol9ri9bv/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5231430Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5231657Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5231898Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5232132Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5232541Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5232933Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5233156Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5233386Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5233482Z ok (5.630s) 2023-01-11T22:44:49.5233502Z 2023-01-11T22:44:49.5233764Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5233875Z Ran 1 test in 5.630s 2023-01-11T22:44:49.5233895Z 2023-01-11T22:44:49.5233983Z OK 2023-01-11T22:44:49.5234002Z 2023-01-11T22:44:49.5234123Z Generating XML reports... 2023-01-11T22:44:49.5234583Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223807.xml 2023-01-11T22:44:49.5234948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5235108Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5235485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5235674Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5235927Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpho9lp7ml 2023-01-11T22:44:49.5236195Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpho9lp7ml/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5236215Z 2023-01-11T22:44:49.5236321Z Running tests... 2023-01-11T22:44:49.5236582Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5236962Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5237230Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5237446Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72172 2023-01-11T22:44:49.5237709Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72173 2023-01-11T22:44:49.5238088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5238259Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5238633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5238823Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5239188Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5239359Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5239714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5239899Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5240156Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6knm6jcq 2023-01-11T22:44:49.5240420Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6knm6jcq/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5240663Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpym_zagsv 2023-01-11T22:44:49.5240923Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpym_zagsv/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5241149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5241377Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5241604Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5241843Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5242243Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5242636Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:44:49.5242866Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5243095Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:44:49.5243195Z ok (5.340s) 2023-01-11T22:44:49.5243218Z 2023-01-11T22:44:49.5243487Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5243597Z Ran 1 test in 5.340s 2023-01-11T22:44:49.5243616Z 2023-01-11T22:44:49.5243692Z OK 2023-01-11T22:44:49.5243710Z 2023-01-11T22:44:49.5243832Z Generating XML reports... 2023-01-11T22:44:49.5244289Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223815.xml 2023-01-11T22:44:49.5244661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5244832Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5245205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5245391Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5245638Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmkd4hubi 2023-01-11T22:44:49.5245980Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmkd4hubi/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5246001Z 2023-01-11T22:44:49.5246093Z Running tests... 2023-01-11T22:44:49.5246362Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5246670Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5247041Z test_all_to_all_single (__main__.GlooProcessGroupWithDispatchedCollectivesTests) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5247260Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72287 2023-01-11T22:44:49.5247632Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5247807Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5248189Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5248365Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5248621Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppy6hrxpo 2023-01-11T22:44:49.5248890Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppy6hrxpo/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5249120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5249357Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5249752Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:44:49.5249850Z ok (3.819s) 2023-01-11T22:44:49.5249870Z 2023-01-11T22:44:49.5250133Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5250243Z Ran 1 test in 3.819s 2023-01-11T22:44:49.5250263Z 2023-01-11T22:44:49.5250339Z OK 2023-01-11T22:44:49.5250357Z 2023-01-11T22:44:49.5250476Z Generating XML reports... 2023-01-11T22:44:49.5251023Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223823.xml 2023-01-11T22:44:49.5251392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5251561Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5251931Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5252120Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5252375Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqzr6hfv1 2023-01-11T22:44:49.5252630Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqzr6hfv1/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5252663Z 2023-01-11T22:44:49.5252755Z Running tests... 2023-01-11T22:44:49.5253167Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5253484Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5253822Z test_allgather_coalesced (__main__.GlooProcessGroupWithDispatchedCollectivesTests) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5254037Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72359 2023-01-11T22:44:49.5254403Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5254575Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5254951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5255254Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5255508Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkmiqv9ip 2023-01-11T22:44:49.5255774Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkmiqv9ip/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5255995Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5256290Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5256712Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:44:49.5257456Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5257571Z warnings.warn( 2023-01-11T22:44:49.5257673Z ok (3.844s) 2023-01-11T22:44:49.5257693Z 2023-01-11T22:44:49.5257943Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5258055Z Ran 1 test in 3.844s 2023-01-11T22:44:49.5258074Z 2023-01-11T22:44:49.5258160Z OK 2023-01-11T22:44:49.5258179Z 2023-01-11T22:44:49.5258300Z Generating XML reports... 2023-01-11T22:44:49.5258849Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223829.xml 2023-01-11T22:44:49.5259211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5259377Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5259750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5259941Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5260180Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpli6xrb4f 2023-01-11T22:44:49.5260446Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpli6xrb4f/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5260466Z 2023-01-11T22:44:49.5260572Z Running tests... 2023-01-11T22:44:49.5260837Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5261146Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5261477Z test_allreduce_coalesced (__main__.GlooProcessGroupWithDispatchedCollectivesTests) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5261693Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72431 2023-01-11T22:44:49.5262058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5262221Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5262600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5262784Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5263037Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0oprjzqm 2023-01-11T22:44:49.5263303Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0oprjzqm/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5263529Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5263769Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5264165Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:44:49.5264977Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5265087Z warnings.warn( 2023-01-11T22:44:49.5265171Z ok (3.842s) 2023-01-11T22:44:49.5265191Z 2023-01-11T22:44:49.5265499Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5265613Z Ran 1 test in 3.842s 2023-01-11T22:44:49.5265632Z 2023-01-11T22:44:49.5265720Z OK 2023-01-11T22:44:49.5265738Z 2023-01-11T22:44:49.5265861Z Generating XML reports... 2023-01-11T22:44:49.5266413Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223835.xml 2023-01-11T22:44:49.5266774Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5266953Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5267315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5267503Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5267758Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcfgipn26 2023-01-11T22:44:49.5268021Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcfgipn26/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5268041Z 2023-01-11T22:44:49.5268146Z Running tests... 2023-01-11T22:44:49.5268413Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5268723Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5269159Z test_collectives (__main__.GlooProcessGroupWithDispatchedCollectivesTests) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5269380Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72503 2023-01-11T22:44:49.5269732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5269907Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5270290Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5270477Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5270731Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf6raym68 2023-01-11T22:44:49.5270996Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf6raym68/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5271222Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5271469Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5271852Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:44:49.5271956Z ok (3.814s) 2023-01-11T22:44:49.5271976Z 2023-01-11T22:44:49.5272239Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5272348Z Ran 1 test in 3.814s 2023-01-11T22:44:49.5272368Z 2023-01-11T22:44:49.5272456Z OK 2023-01-11T22:44:49.5272475Z 2023-01-11T22:44:49.5272597Z Generating XML reports... 2023-01-11T22:44:49.5273144Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223841.xml 2023-01-11T22:44:49.5273508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5273742Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5274109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5274296Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5274550Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp18tyozr3 2023-01-11T22:44:49.5274867Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp18tyozr3/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5274889Z 2023-01-11T22:44:49.5274998Z Running tests... 2023-01-11T22:44:49.5275261Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5275569Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5275898Z test_monitored_barrier (__main__.GlooProcessGroupWithDispatchedCollectivesTests) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5276103Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72575 2023-01-11T22:44:49.5276473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5276648Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5277020Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5277210Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5277460Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpslrujyjo 2023-01-11T22:44:49.5277730Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpslrujyjo/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5277957Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5278196Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5278582Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:44:49.5278680Z ok (3.827s) 2023-01-11T22:44:49.5278700Z 2023-01-11T22:44:49.5278960Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5279064Z Ran 1 test in 3.828s 2023-01-11T22:44:49.5279084Z 2023-01-11T22:44:49.5279175Z OK 2023-01-11T22:44:49.5279198Z 2023-01-11T22:44:49.5279319Z Generating XML reports... 2023-01-11T22:44:49.5279861Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223847.xml 2023-01-11T22:44:49.5280228Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5280401Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5280764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5280953Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5281206Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoa3m11kk 2023-01-11T22:44:49.5281472Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoa3m11kk/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5281494Z 2023-01-11T22:44:49.5281600Z Running tests... 2023-01-11T22:44:49.5281863Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5282171Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5282416Z test_allgather_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5282616Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72647 2023-01-11T22:44:49.5282895Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72648 2023-01-11T22:44:49.5283104Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 72649 2023-01-11T22:44:49.5283311Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 72650 2023-01-11T22:44:49.5283696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5283918Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5284310Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5284496Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5284844Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5285011Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5285380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5285568Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5285926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5286093Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5286461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5286646Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5287010Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5287165Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5287534Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5287715Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5287967Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnfvui9b2 2023-01-11T22:44:49.5288232Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnfvui9b2/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5288483Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9qtcqvmo 2023-01-11T22:44:49.5288748Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9qtcqvmo/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5288995Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmk1a5oq0 2023-01-11T22:44:49.5289254Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmk1a5oq0/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5289488Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp56erxeub 2023-01-11T22:44:49.5289747Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp56erxeub/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5289974Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5290202Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5290428Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5290650Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5290748Z ok (4.051s) 2023-01-11T22:44:49.5290768Z 2023-01-11T22:44:49.5291039Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5291134Z Ran 1 test in 4.051s 2023-01-11T22:44:49.5291153Z 2023-01-11T22:44:49.5291241Z OK 2023-01-11T22:44:49.5291260Z 2023-01-11T22:44:49.5291451Z Generating XML reports... 2023-01-11T22:44:49.5291884Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223853.xml 2023-01-11T22:44:49.5292250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5292422Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5292847Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5293204Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5293444Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzxk9uo3w 2023-01-11T22:44:49.5293714Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzxk9uo3w/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5293734Z 2023-01-11T22:44:49.5293842Z Running tests... 2023-01-11T22:44:49.5294122Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5294432Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5294683Z test_allgather_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5294893Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72830 2023-01-11T22:44:49.5295111Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72831 2023-01-11T22:44:49.5295320Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 72832 2023-01-11T22:44:49.5295514Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 72833 2023-01-11T22:44:49.5295885Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5296056Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5296442Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5296630Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5296984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5297148Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5297518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5297688Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5298052Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5298222Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5298586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5298774Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5299129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5299298Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5299666Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5299851Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5300092Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe39r09yi 2023-01-11T22:44:49.5300354Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe39r09yi/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5300602Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpga93b4m6 2023-01-11T22:44:49.5300958Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpga93b4m6/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5301207Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpighsyuzl 2023-01-11T22:44:49.5301468Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpighsyuzl/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5301748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5301982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5302217Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa75ic7xt 2023-01-11T22:44:49.5302479Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa75ic7xt/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5302701Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5302927Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5303026Z ok (5.240s) 2023-01-11T22:44:49.5303046Z 2023-01-11T22:44:49.5303318Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5303429Z Ran 1 test in 5.241s 2023-01-11T22:44:49.5303448Z 2023-01-11T22:44:49.5303540Z OK 2023-01-11T22:44:49.5303559Z 2023-01-11T22:44:49.5303678Z Generating XML reports... 2023-01-11T22:44:49.5304097Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223900.xml 2023-01-11T22:44:49.5304462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5304631Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5305008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5305201Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5305447Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe5nd1xf8 2023-01-11T22:44:49.5305705Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe5nd1xf8/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5305725Z 2023-01-11T22:44:49.5305828Z Running tests... 2023-01-11T22:44:49.5306081Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5306388Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5306630Z test_allgather_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5306847Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73017 2023-01-11T22:44:49.5307060Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73018 2023-01-11T22:44:49.5307267Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 73019 2023-01-11T22:44:49.5307480Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 73020 2023-01-11T22:44:49.5307849Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5308019Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5308383Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5308572Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5308932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5309102Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5309466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5309733Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5310096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5310265Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5310670Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5310861Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5311222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5311387Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5311758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5311950Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5312204Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8zl8m3u3 2023-01-11T22:44:49.5312469Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8zl8m3u3/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5312715Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8lefoz89 2023-01-11T22:44:49.5312968Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8lefoz89/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5313214Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpadh0o929 2023-01-11T22:44:49.5313471Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpadh0o929/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5313692Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5313918Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5314143Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5314387Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpneho9eaa 2023-01-11T22:44:49.5314646Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpneho9eaa/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5314854Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5314955Z ok (4.118s) 2023-01-11T22:44:49.5314974Z 2023-01-11T22:44:49.5315243Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5315353Z Ran 1 test in 4.118s 2023-01-11T22:44:49.5315372Z 2023-01-11T22:44:49.5315456Z OK 2023-01-11T22:44:49.5315475Z 2023-01-11T22:44:49.5315595Z Generating XML reports... 2023-01-11T22:44:49.5316027Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223907.xml 2023-01-11T22:44:49.5316398Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5316569Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5316926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5317115Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5317366Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpppe7bzbg 2023-01-11T22:44:49.5317631Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpppe7bzbg/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5317651Z 2023-01-11T22:44:49.5317757Z Running tests... 2023-01-11T22:44:49.5318019Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5318323Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5318649Z test_allgather_coalesced_async (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5318851Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73200 2023-01-11T22:44:49.5319064Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73201 2023-01-11T22:44:49.5319273Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 73202 2023-01-11T22:44:49.5319523Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 73203 2023-01-11T22:44:49.5319908Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5320080Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5320456Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5320650Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5321008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5321163Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5321534Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5321725Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5322085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5322252Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5322612Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5322797Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5323163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5323316Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5323683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5323868Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5324118Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplw9lng3y 2023-01-11T22:44:49.5324386Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplw9lng3y/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5324637Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp38ouh3kz 2023-01-11T22:44:49.5324898Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp38ouh3kz/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5325149Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpllep4up9 2023-01-11T22:44:49.5325413Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpllep4up9/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5325624Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5325842Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5326065Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5326309Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx9joal4d 2023-01-11T22:44:49.5326568Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx9joal4d/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5326784Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5327023Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5327329Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:44:49.5327554Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5327787Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:44:49.5328243Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5328654Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5329391Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5329504Z warnings.warn( 2023-01-11T22:44:49.5330230Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5330365Z warnings.warn( 2023-01-11T22:44:49.5330762Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5331151Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5331875Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5331973Z warnings.warn( 2023-01-11T22:44:49.5332692Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5332798Z warnings.warn( 2023-01-11T22:44:49.5333077Z ok (4.144s) 2023-01-11T22:44:49.5333098Z 2023-01-11T22:44:49.5333375Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5333489Z Ran 1 test in 4.144s 2023-01-11T22:44:49.5333509Z 2023-01-11T22:44:49.5333597Z OK 2023-01-11T22:44:49.5333617Z 2023-01-11T22:44:49.5333736Z Generating XML reports... 2023-01-11T22:44:49.5334163Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223914.xml 2023-01-11T22:44:49.5334520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5334697Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5335070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5335261Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5335515Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz_3f3ecy 2023-01-11T22:44:49.5335780Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz_3f3ecy/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5335800Z 2023-01-11T22:44:49.5335904Z Running tests... 2023-01-11T22:44:49.5336167Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5336460Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5336820Z test_allgather_coalesced_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5337037Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73383 2023-01-11T22:44:49.5337250Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73384 2023-01-11T22:44:49.5337462Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 73385 2023-01-11T22:44:49.5337727Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 73386 2023-01-11T22:44:49.5338115Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5338289Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5338666Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5338841Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5339205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5339375Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5339747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5339940Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5340296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5340466Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5340844Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5341016Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5341377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5341544Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5341910Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5342092Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5342348Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoprmm1uq 2023-01-11T22:44:49.5342615Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoprmm1uq/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5342860Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_vl92i77 2023-01-11T22:44:49.5343119Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_vl92i77/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5343329Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5343582Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5zv9xotp 2023-01-11T22:44:49.5343846Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5zv9xotp/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5344069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5344295Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5344542Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt1p2awxt 2023-01-11T22:44:49.5344814Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt1p2awxt/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5345041Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5345790Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5345957Z warnings.warn( 2023-01-11T22:44:49.5346738Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5346852Z warnings.warn( 2023-01-11T22:44:49.5347583Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5347691Z warnings.warn( 2023-01-11T22:44:49.5348409Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2588: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5348515Z warnings.warn( 2023-01-11T22:44:49.5348612Z ok (4.121s) 2023-01-11T22:44:49.5348632Z 2023-01-11T22:44:49.5348899Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5349013Z Ran 1 test in 4.121s 2023-01-11T22:44:49.5349033Z 2023-01-11T22:44:49.5349109Z OK 2023-01-11T22:44:49.5349127Z 2023-01-11T22:44:49.5349250Z Generating XML reports... 2023-01-11T22:44:49.5349685Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223920.xml 2023-01-11T22:44:49.5350054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5350232Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5350607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5350801Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5351057Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvnuya6uj 2023-01-11T22:44:49.5351313Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvnuya6uj/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5351350Z 2023-01-11T22:44:49.5351442Z Running tests... 2023-01-11T22:44:49.5351708Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5352017Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5352285Z test_allgather_noncontiguous_input (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5352506Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73566 2023-01-11T22:44:49.5352724Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73567 2023-01-11T22:44:49.5352937Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 73568 2023-01-11T22:44:49.5353148Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 73569 2023-01-11T22:44:49.5353506Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5353683Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5354062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5354250Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5354616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5354854Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5355235Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5355420Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5355813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5355995Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5356368Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5356549Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5356906Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5357083Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5357458Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5357642Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5357894Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjwus1boy 2023-01-11T22:44:49.5358136Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoo1ugx4q 2023-01-11T22:44:49.5358408Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjwus1boy/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5358670Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoo1ugx4q/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5358923Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwgppzw13 2023-01-11T22:44:49.5359185Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwgppzw13/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5359412Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5359639Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5359863Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5360116Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiepc4caf 2023-01-11T22:44:49.5360364Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiepc4caf/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5360587Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5360686Z ok (4.123s) 2023-01-11T22:44:49.5360706Z 2023-01-11T22:44:49.5360977Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5361086Z Ran 1 test in 4.124s 2023-01-11T22:44:49.5361109Z 2023-01-11T22:44:49.5361204Z OK 2023-01-11T22:44:49.5361223Z 2023-01-11T22:44:49.5361345Z Generating XML reports... 2023-01-11T22:44:49.5361779Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223927.xml 2023-01-11T22:44:49.5362130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5362306Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5362684Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5362875Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5363122Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpww8t_zci 2023-01-11T22:44:49.5363383Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpww8t_zci/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5363461Z 2023-01-11T22:44:49.5363573Z Running tests... 2023-01-11T22:44:49.5363842Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5364137Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5364418Z test_allgather_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5364690Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73749 2023-01-11T22:44:49.5364918Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73750 2023-01-11T22:44:49.5365133Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 73751 2023-01-11T22:44:49.5365346Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 73752 2023-01-11T22:44:49.5365730Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5365908Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5366281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5366454Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5366813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5366991Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5367362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5367546Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5367904Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5368074Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5368450Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5368620Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5368977Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5369146Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5369519Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5369706Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5369959Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp06fag6tc 2023-01-11T22:44:49.5370224Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp06fag6tc/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5370455Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5370707Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzieef2be 2023-01-11T22:44:49.5370956Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzieef2be/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5371181Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5371433Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpebdlvbhm 2023-01-11T22:44:49.5371696Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpebdlvbhm/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5371939Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfbx8zxt8 2023-01-11T22:44:49.5372201Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfbx8zxt8/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5372505Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5372727Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5372813Z ok (4.723s) 2023-01-11T22:44:49.5372848Z 2023-01-11T22:44:49.5373467Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5373582Z Ran 1 test in 4.723s 2023-01-11T22:44:49.5373602Z 2023-01-11T22:44:49.5373694Z OK 2023-01-11T22:44:49.5373796Z 2023-01-11T22:44:49.5373926Z Generating XML reports... 2023-01-11T22:44:49.5374366Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223933.xml 2023-01-11T22:44:49.5374731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5374902Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5375271Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5375451Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5375701Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj2w0yddy 2023-01-11T22:44:49.5375966Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj2w0yddy/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5375986Z 2023-01-11T22:44:49.5376094Z Running tests... 2023-01-11T22:44:49.5376360Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5376669Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5376925Z test_allgather_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5377140Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73956 2023-01-11T22:44:49.5377339Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73957 2023-01-11T22:44:49.5377555Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 73958 2023-01-11T22:44:49.5377766Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 73959 2023-01-11T22:44:49.5378136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5378315Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5378694Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5378881Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5379242Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5379410Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5379770Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5379958Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5380315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5380488Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5380860Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5381046Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5381411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5381583Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5381934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5382202Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5382457Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpam7c1atc 2023-01-11T22:44:49.5382710Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppp7yzwqm 2023-01-11T22:44:49.5383020Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpam7c1atc/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5383289Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppp7yzwqm/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5383538Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_r4gj3o_ 2023-01-11T22:44:49.5383799Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_r4gj3o_/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5384021Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5384228Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5384450Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5384696Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpucbhbqmy 2023-01-11T22:44:49.5384962Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpucbhbqmy/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5385181Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5385282Z ok (6.843s) 2023-01-11T22:44:49.5385302Z 2023-01-11T22:44:49.5385578Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5385684Z Ran 1 test in 6.843s 2023-01-11T22:44:49.5385704Z 2023-01-11T22:44:49.5385779Z OK 2023-01-11T22:44:49.5385813Z 2023-01-11T22:44:49.5385921Z Generating XML reports... 2023-01-11T22:44:49.5386358Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223940.xml 2023-01-11T22:44:49.5386727Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5386903Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5387280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5387472Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5387720Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxvlru6vr 2023-01-11T22:44:49.5387981Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxvlru6vr/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5388001Z 2023-01-11T22:44:49.5388091Z Running tests... 2023-01-11T22:44:49.5388357Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5388668Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5388916Z test_allreduce_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5389134Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74167 2023-01-11T22:44:49.5389349Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74168 2023-01-11T22:44:49.5389567Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 74169 2023-01-11T22:44:49.5389782Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 74170 2023-01-11T22:44:49.5390135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5390310Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5390689Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5390943Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5391317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5391490Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5391945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5392138Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5392501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5392659Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5393027Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5393218Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5393576Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5393747Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5394128Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5394315Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5394569Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpim902j9v 2023-01-11T22:44:49.5394822Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpim902j9v/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5395073Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbtoqvc8f 2023-01-11T22:44:49.5395324Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpctxg4glw 2023-01-11T22:44:49.5395589Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbtoqvc8f/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5395838Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1f18zqm_ 2023-01-11T22:44:49.5396097Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpctxg4glw/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5396354Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1f18zqm_/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5396582Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5396801Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5397012Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5397237Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5397342Z ok (4.139s) 2023-01-11T22:44:49.5397361Z 2023-01-11T22:44:49.5397634Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5397744Z Ran 1 test in 4.139s 2023-01-11T22:44:49.5397763Z 2023-01-11T22:44:49.5397854Z OK 2023-01-11T22:44:49.5397873Z 2023-01-11T22:44:49.5397995Z Generating XML reports... 2023-01-11T22:44:49.5398426Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223949.xml 2023-01-11T22:44:49.5398779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5398953Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5399326Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5399514Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5399831Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxeul8hs5 2023-01-11T22:44:49.5400095Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxeul8hs5/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5400115Z 2023-01-11T22:44:49.5400222Z Running tests... 2023-01-11T22:44:49.5400488Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5400846Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5401096Z test_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5401312Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74350 2023-01-11T22:44:49.5401529Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74351 2023-01-11T22:44:49.5401743Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 74352 2023-01-11T22:44:49.5401960Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 74353 2023-01-11T22:44:49.5402339Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5402514Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5402893Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5403064Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5403426Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5403598Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5403973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5404161Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5404519Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5404688Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5405067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5405256Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5405600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5405768Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5406141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5406325Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5406582Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvl7fb6si 2023-01-11T22:44:49.5406848Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvl7fb6si/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5407072Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5407328Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpatsnd9t4 2023-01-11T22:44:49.5407580Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpatsnd9t4/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5407830Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiqnokrhm 2023-01-11T22:44:49.5408094Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiqnokrhm/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5408316Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5408634Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfaqpbrul 2023-01-11T22:44:49.5408897Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfaqpbrul/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5409121Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5409388Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5409493Z ok (5.122s) 2023-01-11T22:44:49.5409514Z 2023-01-11T22:44:49.5409775Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5409888Z Ran 1 test in 5.122s 2023-01-11T22:44:49.5409908Z 2023-01-11T22:44:49.5410000Z OK 2023-01-11T22:44:49.5410019Z 2023-01-11T22:44:49.5410140Z Generating XML reports... 2023-01-11T22:44:49.5410574Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223956.xml 2023-01-11T22:44:49.5410949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5411122Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5411498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5411670Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5411924Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoejr7eae 2023-01-11T22:44:49.5412191Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoejr7eae/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5412212Z 2023-01-11T22:44:49.5412320Z Running tests... 2023-01-11T22:44:49.5412584Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5413126Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5413422Z test_allreduce_basics_cuda_using_work_api (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5413642Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74537 2023-01-11T22:44:49.5413856Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74538 2023-01-11T22:44:49.5414055Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 74539 2023-01-11T22:44:49.5414267Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 74540 2023-01-11T22:44:49.5414644Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5414815Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5415190Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5415385Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5415748Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5415919Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5416273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5416465Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5416825Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5416996Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5417360Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5417546Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5418015Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5418185Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5418549Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5418721Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5419041Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl1xthag7 2023-01-11T22:44:49.5419324Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl1xthag7/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5419576Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkoqti4l1 2023-01-11T22:44:49.5419841Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkoqti4l1/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5420098Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfherr3a8 2023-01-11T22:44:49.5420363Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfherr3a8/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5420588Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5420792Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5421022Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5421271Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpseekd3zi 2023-01-11T22:44:49.5421532Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpseekd3zi/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5421749Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5421848Z ok (5.135s) 2023-01-11T22:44:49.5421871Z 2023-01-11T22:44:49.5422147Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5422258Z Ran 1 test in 5.135s 2023-01-11T22:44:49.5422278Z 2023-01-11T22:44:49.5422367Z OK 2023-01-11T22:44:49.5422387Z 2023-01-11T22:44:49.5422492Z Generating XML reports... 2023-01-11T22:44:49.5422923Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224003.xml 2023-01-11T22:44:49.5423294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5423468Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5423847Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5424036Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5424289Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy3bz6ver 2023-01-11T22:44:49.5424558Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy3bz6ver/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5424578Z 2023-01-11T22:44:49.5424668Z Running tests... 2023-01-11T22:44:49.5424931Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5425237Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5425508Z test_allreduce_basics_using_work_api (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5425723Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74724 2023-01-11T22:44:49.5425936Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74725 2023-01-11T22:44:49.5426149Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 74726 2023-01-11T22:44:49.5426358Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 74727 2023-01-11T22:44:49.5426805Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5426963Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5427342Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5427578Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5427956Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5428125Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5428500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5428685Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5429055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5429210Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5429574Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5429759Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5430125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5430300Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5430666Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5430877Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5431134Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx1mgtl4f 2023-01-11T22:44:49.5431404Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx1mgtl4f/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5431638Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmpkw6mcb 2023-01-11T22:44:49.5431908Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmpkw6mcb/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5432137Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5432386Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcuystp27 2023-01-11T22:44:49.5432647Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcuystp27/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5432873Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5433119Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe12mpecw 2023-01-11T22:44:49.5433386Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe12mpecw/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5433592Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5433817Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5433916Z ok (4.052s) 2023-01-11T22:44:49.5433935Z 2023-01-11T22:44:49.5434212Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5434323Z Ran 1 test in 4.053s 2023-01-11T22:44:49.5434343Z 2023-01-11T22:44:49.5434433Z OK 2023-01-11T22:44:49.5434451Z 2023-01-11T22:44:49.5434573Z Generating XML reports... 2023-01-11T22:44:49.5435003Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224011.xml 2023-01-11T22:44:49.5435368Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5435617Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5436002Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5436192Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5436441Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7zlo89zf 2023-01-11T22:44:49.5436753Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7zlo89zf/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5436774Z 2023-01-11T22:44:49.5436885Z Running tests... 2023-01-11T22:44:49.5437151Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5437458Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5437690Z test_allreduce_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5437910Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74907 2023-01-11T22:44:49.5438122Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74908 2023-01-11T22:44:49.5438331Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 74909 2023-01-11T22:44:49.5438544Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 74910 2023-01-11T22:44:49.5438920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5439095Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5439470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5439643Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5440003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5440177Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5440545Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5440730Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5441087Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5441258Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5441633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5441818Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5442158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5442330Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5442701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5442890Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5443140Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpimshyk9y 2023-01-11T22:44:49.5443410Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpimshyk9y/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5443659Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3ixo8bsn 2023-01-11T22:44:49.5443925Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3ixo8bsn/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5444175Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpewjs1122 2023-01-11T22:44:49.5444487Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpewjs1122/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5444736Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2b_cnapx 2023-01-11T22:44:49.5445000Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2b_cnapx/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5445227Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5445493Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5445717Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5445937Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5446038Z ok (4.134s) 2023-01-11T22:44:49.5446058Z 2023-01-11T22:44:49.5446315Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5446430Z Ran 1 test in 4.134s 2023-01-11T22:44:49.5446449Z 2023-01-11T22:44:49.5446542Z OK 2023-01-11T22:44:49.5446561Z 2023-01-11T22:44:49.5446682Z Generating XML reports... 2023-01-11T22:44:49.5447116Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224017.xml 2023-01-11T22:44:49.5447484Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5447664Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5448041Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5448229Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5448466Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz6ecj8ap 2023-01-11T22:44:49.5448733Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz6ecj8ap/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5448756Z 2023-01-11T22:44:49.5448864Z Running tests... 2023-01-11T22:44:49.5449127Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5449433Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5449693Z test_allreduce_coalesced_async (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5449915Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75090 2023-01-11T22:44:49.5450130Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75091 2023-01-11T22:44:49.5450328Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 75092 2023-01-11T22:44:49.5450538Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 75093 2023-01-11T22:44:49.5450911Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5451088Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5451469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5451658Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5452020Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5452189Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5452544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5452732Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5453298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5453560Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5453942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5454124Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5454489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5454720Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5455105Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5455273Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5455529Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpecf4lyvo 2023-01-11T22:44:49.5455792Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpecf4lyvo/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5456058Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz8vast_a 2023-01-11T22:44:49.5456325Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz8vast_a/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5456577Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6l3glqmh 2023-01-11T22:44:49.5456843Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6l3glqmh/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5457093Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpetv4dtm8 2023-01-11T22:44:49.5457349Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpetv4dtm8/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5457560Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5457786Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5458015Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5458239Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5458478Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5458723Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5458960Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:44:49.5459196Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:44:49.5459587Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5459985Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5460383Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5461131Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5461246Z warnings.warn( 2023-01-11T22:44:49.5461976Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5462084Z warnings.warn( 2023-01-11T22:44:49.5462806Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5462982Z warnings.warn( 2023-01-11T22:44:49.5463384Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5464159Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1714: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2023-01-11T22:44:49.5464260Z warnings.warn( 2023-01-11T22:44:49.5464361Z ok (4.129s) 2023-01-11T22:44:49.5464381Z 2023-01-11T22:44:49.5464655Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5464766Z Ran 1 test in 4.129s 2023-01-11T22:44:49.5464790Z 2023-01-11T22:44:49.5464883Z OK 2023-01-11T22:44:49.5464902Z 2023-01-11T22:44:49.5465026Z Generating XML reports... 2023-01-11T22:44:49.5465453Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224024.xml 2023-01-11T22:44:49.5465815Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5465993Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5466357Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5466549Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5466803Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm44cuh2c 2023-01-11T22:44:49.5467069Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm44cuh2c/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5467093Z 2023-01-11T22:44:49.5467197Z Running tests... 2023-01-11T22:44:49.5467462Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5467773Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5468036Z test_allreduce_coalesced_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5468241Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75273 2023-01-11T22:44:49.5468458Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75274 2023-01-11T22:44:49.5468669Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 75275 2023-01-11T22:44:49.5468880Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 75276 2023-01-11T22:44:49.5469248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5469423Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5469799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5469987Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5470348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5470506Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5470875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5471062Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5471427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5471597Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5472035Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5472217Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5472578Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5472784Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5473164Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5473350Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5473603Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpst418rvq 2023-01-11T22:44:49.5473869Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpst418rvq/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5474125Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw78q56ht 2023-01-11T22:44:49.5474389Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw78q56ht/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5474618Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5474872Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptug04z61 2023-01-11T22:44:49.5475123Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptug04z61/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5475348Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5475561Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5475813Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzdmk2pja 2023-01-11T22:44:49.5476077Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzdmk2pja/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5476304Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5476407Z ok (4.113s) 2023-01-11T22:44:49.5476426Z 2023-01-11T22:44:49.5476699Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5476795Z Ran 1 test in 4.114s 2023-01-11T22:44:49.5476815Z 2023-01-11T22:44:49.5476906Z OK 2023-01-11T22:44:49.5476928Z 2023-01-11T22:44:49.5477050Z Generating XML reports... 2023-01-11T22:44:49.5477479Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224030.xml 2023-01-11T22:44:49.5477851Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5478025Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5478398Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5478590Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5478840Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx032rszs 2023-01-11T22:44:49.5479087Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx032rszs/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5479107Z 2023-01-11T22:44:49.5479217Z Running tests... 2023-01-11T22:44:49.5479482Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5479793Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5480055Z test_allreduce_coalesced_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5480274Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75456 2023-01-11T22:44:49.5480490Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75457 2023-01-11T22:44:49.5480773Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 75458 2023-01-11T22:44:49.5480971Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 75459 2023-01-11T22:44:49.5481349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5481566Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5481961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5482149Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5482512Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5482684Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5483062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5483248Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5483597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5483768Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5484149Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5484334Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5484690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5484853Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5485223Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5485415Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5485654Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppbffodr3 2023-01-11T22:44:49.5485922Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppbffodr3/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5486151Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5486405Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpswe5ll1r 2023-01-11T22:44:49.5486665Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpswe5ll1r/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5486914Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp4c0jaqe 2023-01-11T22:44:49.5487174Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp4c0jaqe/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5487426Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv0de901u 2023-01-11T22:44:49.5487687Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv0de901u/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5487897Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5488117Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5488342Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5488445Z ok (4.137s) 2023-01-11T22:44:49.5488466Z 2023-01-11T22:44:49.5488738Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5488848Z Ran 1 test in 4.137s 2023-01-11T22:44:49.5488869Z 2023-01-11T22:44:49.5488962Z OK 2023-01-11T22:44:49.5488981Z 2023-01-11T22:44:49.5489103Z Generating XML reports... 2023-01-11T22:44:49.5489594Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224036.xml 2023-01-11T22:44:49.5489962Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5490138Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5490559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5490760Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5491020Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm8pvkw4c 2023-01-11T22:44:49.5491289Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm8pvkw4c/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5491309Z 2023-01-11T22:44:49.5491418Z Running tests... 2023-01-11T22:44:49.5491689Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5491998Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5492266Z test_allreduce_coalesced_checks_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5492469Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75639 2023-01-11T22:44:49.5492686Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75640 2023-01-11T22:44:49.5493079Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 75641 2023-01-11T22:44:49.5493298Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 75642 2023-01-11T22:44:49.5493666Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5493833Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5494202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5494400Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5494749Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5494922Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5495298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5495488Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5495849Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5496020Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5496388Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5496580Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5496949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5497103Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5497472Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5497660Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5497915Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfei_iya1 2023-01-11T22:44:49.5498185Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfei_iya1/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5498437Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyx6sxj8i 2023-01-11T22:44:49.5498816Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyx6sxj8i/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5499073Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprtn86q_a 2023-01-11T22:44:49.5499324Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprtn86q_a/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5499556Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5499834Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5500070Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5500323Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcm4lgk6e 2023-01-11T22:44:49.5500591Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcm4lgk6e/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5500815Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5500920Z ok (5.128s) 2023-01-11T22:44:49.5500940Z 2023-01-11T22:44:49.5501221Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5501319Z Ran 1 test in 5.128s 2023-01-11T22:44:49.5501338Z 2023-01-11T22:44:49.5501433Z OK 2023-01-11T22:44:49.5501451Z 2023-01-11T22:44:49.5501576Z Generating XML reports... 2023-01-11T22:44:49.5502013Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224043.xml 2023-01-11T22:44:49.5502381Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5502558Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5502939Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5503135Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5503372Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6aoxe65u 2023-01-11T22:44:49.5503641Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6aoxe65u/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5503660Z 2023-01-11T22:44:49.5503774Z Running tests... 2023-01-11T22:44:49.5504041Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5504357Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5504623Z test_allreduce_coalesced_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5504844Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75826 2023-01-11T22:44:49.5505061Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75827 2023-01-11T22:44:49.5505276Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 75828 2023-01-11T22:44:49.5505475Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 75829 2023-01-11T22:44:49.5505850Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5506027Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5506412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5506604Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5506965Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5507140Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5507511Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5507745Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5508120Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5508297Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5508713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5508909Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5509277Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5509451Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5509820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5510009Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5510247Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu2nqzixe 2023-01-11T22:44:49.5510519Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu2nqzixe/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5510747Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5511003Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu_jmxst_ 2023-01-11T22:44:49.5511270Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu_jmxst_/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5511521Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg5987r8q 2023-01-11T22:44:49.5511779Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg5987r8q/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5512026Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphevji4xj 2023-01-11T22:44:49.5512280Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphevji4xj/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5512510Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5512735Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5512963Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5513067Z ok (4.431s) 2023-01-11T22:44:49.5513087Z 2023-01-11T22:44:49.5513364Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5513478Z Ran 1 test in 4.431s 2023-01-11T22:44:49.5513497Z 2023-01-11T22:44:49.5513592Z OK 2023-01-11T22:44:49.5513611Z 2023-01-11T22:44:49.5513735Z Generating XML reports... 2023-01-11T22:44:49.5514152Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224050.xml 2023-01-11T22:44:49.5514526Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5514703Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5515079Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5515273Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5515524Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphgq_ibwv 2023-01-11T22:44:49.5515794Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphgq_ibwv/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5515814Z 2023-01-11T22:44:49.5515923Z Running tests... 2023-01-11T22:44:49.5516172Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5516484Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5516798Z test_allreduce_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5517017Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76033 2023-01-11T22:44:49.5517233Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76034 2023-01-11T22:44:49.5517449Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 76035 2023-01-11T22:44:49.5517706Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 76036 2023-01-11T22:44:49.5518095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5518254Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5518634Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5518828Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5519195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5519371Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5519741Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5519932Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5520293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5520466Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5520825Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5521014Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5521376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5521549Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5521919Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5522112Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5522369Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpollued7w 2023-01-11T22:44:49.5522641Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpollued7w/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5522893Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp44my6d3d 2023-01-11T22:44:49.5523125Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4ou7u7rv 2023-01-11T22:44:49.5523392Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp44my6d3d/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5523658Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4ou7u7rv/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5523913Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvg9odmxl 2023-01-11T22:44:49.5524181Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvg9odmxl/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5524413Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5524643Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5524865Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5525074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5525182Z ok (4.228s) 2023-01-11T22:44:49.5525262Z 2023-01-11T22:44:49.5525549Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5525663Z Ran 1 test in 4.229s 2023-01-11T22:44:49.5525683Z 2023-01-11T22:44:49.5525780Z OK 2023-01-11T22:44:49.5525799Z 2023-01-11T22:44:49.5525924Z Generating XML reports... 2023-01-11T22:44:49.5526358Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224057.xml 2023-01-11T22:44:49.5526778Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5526962Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5527330Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5527521Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5527776Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqqjmzqt8 2023-01-11T22:44:49.5528052Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqqjmzqt8/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5528072Z 2023-01-11T22:44:49.5528183Z Running tests... 2023-01-11T22:44:49.5528447Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5528760Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5529020Z test_allreduce_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5529219Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76240 2023-01-11T22:44:49.5529436Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76241 2023-01-11T22:44:49.5529652Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 76242 2023-01-11T22:44:49.5529864Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 76243 2023-01-11T22:44:49.5530244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5530421Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5530802Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5530998Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5531379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5531557Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5531932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5532123Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5532491Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5532664Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5533210Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5533404Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5533779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5533934Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5534304Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5534493Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5534748Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8sf2z1z1 2023-01-11T22:44:49.5535139Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8sf2z1z1/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5535370Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5535625Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps0j3i3qv 2023-01-11T22:44:49.5535948Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps0j3i3qv/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5536213Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgo6bkrn0 2023-01-11T22:44:49.5536460Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgo6bkrn0/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5536715Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfph9yn13 2023-01-11T22:44:49.5536981Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfph9yn13/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5537213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5537441Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5537667Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5537771Z ok (5.523s) 2023-01-11T22:44:49.5537790Z 2023-01-11T22:44:49.5538074Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5538171Z Ran 1 test in 5.523s 2023-01-11T22:44:49.5538191Z 2023-01-11T22:44:49.5538285Z OK 2023-01-11T22:44:49.5538304Z 2023-01-11T22:44:49.5538429Z Generating XML reports... 2023-01-11T22:44:49.5538861Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224104.xml 2023-01-11T22:44:49.5539230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5539413Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5539792Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5539985Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5540221Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy7ks46oh 2023-01-11T22:44:49.5540493Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy7ks46oh/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5540513Z 2023-01-11T22:44:49.5540625Z Running tests... 2023-01-11T22:44:49.5540893Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5541203Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5541457Z test_barrier_implies_wait (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5541681Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76451 2023-01-11T22:44:49.5541901Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76452 2023-01-11T22:44:49.5542116Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 76453 2023-01-11T22:44:49.5542308Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 76454 2023-01-11T22:44:49.5542686Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5542863Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5543245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5543437Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5543803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5544042Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5544424Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5544593Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5545008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5545190Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5545569Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5545756Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5546119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5546298Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5546669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5546858Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5547096Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpae0okhvm 2023-01-11T22:44:49.5547369Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpae0okhvm/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5547621Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9smgbmvw 2023-01-11T22:44:49.5547888Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9smgbmvw/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5548140Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz35jkeql 2023-01-11T22:44:49.5548410Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz35jkeql/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5548642Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5548868Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5549072Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5549326Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfi90k0gt 2023-01-11T22:44:49.5549589Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfi90k0gt/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5549814Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5549916Z ok (4.132s) 2023-01-11T22:44:49.5549936Z 2023-01-11T22:44:49.5550205Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5550326Z Ran 1 test in 4.132s 2023-01-11T22:44:49.5550345Z 2023-01-11T22:44:49.5550440Z OK 2023-01-11T22:44:49.5550459Z 2023-01-11T22:44:49.5550584Z Generating XML reports... 2023-01-11T22:44:49.5550996Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224111.xml 2023-01-11T22:44:49.5551362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5551540Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5551918Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5552108Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5552357Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv_7ctb4w 2023-01-11T22:44:49.5552621Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv_7ctb4w/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5552696Z 2023-01-11T22:44:49.5552812Z Running tests... 2023-01-11T22:44:49.5553067Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5553380Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5553628Z test_broadcast_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5553890Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76634 2023-01-11T22:44:49.5554117Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76635 2023-01-11T22:44:49.5554333Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 76636 2023-01-11T22:44:49.5554550Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 76637 2023-01-11T22:44:49.5554929Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5555107Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5555465Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5555656Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5556024Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5556200Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5556571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5556759Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5557119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5557296Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5557644Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5557817Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5558192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5558385Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5558761Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5558949Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5559205Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2fe25up4 2023-01-11T22:44:49.5559475Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2fe25up4/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5559733Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpos7tu27_ 2023-01-11T22:44:49.5559982Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpos7tu27_/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5560232Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3dh3gyd_ 2023-01-11T22:44:49.5560502Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3dh3gyd_/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5560754Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk5298gs7 2023-01-11T22:44:49.5561016Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk5298gs7/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5561245Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5561474Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5561774Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5561982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5562085Z ok (4.013s) 2023-01-11T22:44:49.5562105Z 2023-01-11T22:44:49.5562383Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5562497Z Ran 1 test in 4.013s 2023-01-11T22:44:49.5562517Z 2023-01-11T22:44:49.5562666Z OK 2023-01-11T22:44:49.5562687Z 2023-01-11T22:44:49.5562819Z Generating XML reports... 2023-01-11T22:44:49.5563260Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224118.xml 2023-01-11T22:44:49.5563630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5563807Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5564171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5564361Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5564615Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqzkkw6nr 2023-01-11T22:44:49.5564882Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqzkkw6nr/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5564905Z 2023-01-11T22:44:49.5565016Z Running tests... 2023-01-11T22:44:49.5565282Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5565592Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5565851Z test_broadcast_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5566052Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76817 2023-01-11T22:44:49.5566274Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76818 2023-01-11T22:44:49.5566489Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 76819 2023-01-11T22:44:49.5566702Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 76820 2023-01-11T22:44:49.5567079Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5567262Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5567645Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5567836Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5568198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5568354Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5568731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5568921Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5569282Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5569459Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5569826Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5570000Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5570376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5570548Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5570998Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5571187Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5571446Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9ngcyfrc 2023-01-11T22:44:49.5571716Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9ngcyfrc/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5571993Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5572255Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe2_z8g6h 2023-01-11T22:44:49.5572522Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe2_z8g6h/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5572773Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppnejpidx 2023-01-11T22:44:49.5573179Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplezl9k4a 2023-01-11T22:44:49.5573467Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppnejpidx/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5573726Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplezl9k4a/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5573954Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5574183Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5574408Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5574512Z ok (5.130s) 2023-01-11T22:44:49.5574531Z 2023-01-11T22:44:49.5574808Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5574904Z Ran 1 test in 5.131s 2023-01-11T22:44:49.5574942Z 2023-01-11T22:44:49.5575018Z OK 2023-01-11T22:44:49.5575036Z 2023-01-11T22:44:49.5575165Z Generating XML reports... 2023-01-11T22:44:49.5575603Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224124.xml 2023-01-11T22:44:49.5575976Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5576153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5576534Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5576728Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5576980Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmyc8qe67 2023-01-11T22:44:49.5577229Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmyc8qe67/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5577249Z 2023-01-11T22:44:49.5577359Z Running tests... 2023-01-11T22:44:49.5577632Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5577943Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5578194Z test_broadcast_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5578414Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77004 2023-01-11T22:44:49.5578634Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77005 2023-01-11T22:44:49.5578852Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 77006 2023-01-11T22:44:49.5579047Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 77007 2023-01-11T22:44:49.5579417Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5579594Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5580076Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5580266Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5580634Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5580810Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5581252Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5581453Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5581810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5581984Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5582355Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5582549Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5582912Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5583085Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5583458Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5583647Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5583884Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9ycx1cjx 2023-01-11T22:44:49.5584154Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9ycx1cjx/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5584381Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5584637Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp004ilm2u 2023-01-11T22:44:49.5584904Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp004ilm2u/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5585132Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5585393Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2hzux8om 2023-01-11T22:44:49.5585660Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2hzux8om/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5585911Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr4y6m2c1 2023-01-11T22:44:49.5586155Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr4y6m2c1/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5586379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5586608Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5586710Z ok (4.108s) 2023-01-11T22:44:49.5586731Z 2023-01-11T22:44:49.5587002Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5587118Z Ran 1 test in 4.109s 2023-01-11T22:44:49.5587137Z 2023-01-11T22:44:49.5587232Z OK 2023-01-11T22:44:49.5587251Z 2023-01-11T22:44:49.5587377Z Generating XML reports... 2023-01-11T22:44:49.5587795Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224132.xml 2023-01-11T22:44:49.5588169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5588344Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5588722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5588978Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5589234Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdxffg29t 2023-01-11T22:44:49.5589505Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdxffg29t/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5589525Z 2023-01-11T22:44:49.5589635Z Running tests... 2023-01-11T22:44:49.5589952Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5590259Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5590509Z test_broadcast_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5590729Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77187 2023-01-11T22:44:49.5590945Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77188 2023-01-11T22:44:49.5591160Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 77189 2023-01-11T22:44:49.5591379Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 77190 2023-01-11T22:44:49.5591752Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5591929Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5592291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5592481Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5592849Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5593021Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5593391Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5593583Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5593946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5594120Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5594502Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5594670Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5595032Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5595206Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5595578Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5595771Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5596028Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsy41om48 2023-01-11T22:44:49.5596295Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsy41om48/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5596523Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5596762Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg5nf1qvc 2023-01-11T22:44:49.5597029Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg5nf1qvc/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5597282Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf6oqke5j 2023-01-11T22:44:49.5597548Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf6oqke5j/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5597801Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpodylm2rw 2023-01-11T22:44:49.5598130Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpodylm2rw/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5598358Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5598578Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5598845Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5598936Z ok (4.228s) 2023-01-11T22:44:49.5598955Z 2023-01-11T22:44:49.5599232Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5599350Z Ran 1 test in 4.228s 2023-01-11T22:44:49.5599369Z 2023-01-11T22:44:49.5599463Z OK 2023-01-11T22:44:49.5599482Z 2023-01-11T22:44:49.5599606Z Generating XML reports... 2023-01-11T22:44:49.5600040Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224138.xml 2023-01-11T22:44:49.5600411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5600586Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5600945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5601140Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5601389Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2bcenyl0 2023-01-11T22:44:49.5601659Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2bcenyl0/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5601678Z 2023-01-11T22:44:49.5601789Z Running tests... 2023-01-11T22:44:49.5602054Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5602366Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5602628Z test_broadcast_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5602846Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77394 2023-01-11T22:44:49.5603046Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77395 2023-01-11T22:44:49.5603267Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 77396 2023-01-11T22:44:49.5603481Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 77397 2023-01-11T22:44:49.5603853Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5604029Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5604408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5604602Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5604968Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5605122Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5605495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5605689Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5606048Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5606222Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5606595Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5606785Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5607229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5607402Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5607753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5607992Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5608256Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjlwg3m4z 2023-01-11T22:44:49.5608525Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjlwg3m4z/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5608782Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvky6vgzn 2023-01-11T22:44:49.5609051Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvky6vgzn/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5609308Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgrv0s5ki 2023-01-11T22:44:49.5609572Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgrv0s5ki/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5609804Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd_fa6zrr 2023-01-11T22:44:49.5610068Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd_fa6zrr/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5610298Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5610526Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5610751Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5610975Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5611081Z ok (5.534s) 2023-01-11T22:44:49.5611104Z 2023-01-11T22:44:49.5611385Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5611482Z Ran 1 test in 5.535s 2023-01-11T22:44:49.5611522Z 2023-01-11T22:44:49.5611599Z OK 2023-01-11T22:44:49.5611617Z 2023-01-11T22:44:49.5611742Z Generating XML reports... 2023-01-11T22:44:49.5612172Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224145.xml 2023-01-11T22:44:49.5612547Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5612725Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5613300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5613499Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5613755Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsb1awq0n 2023-01-11T22:44:49.5614013Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsb1awq0n/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5614053Z 2023-01-11T22:44:49.5614145Z Running tests... 2023-01-11T22:44:49.5614415Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5614732Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5614978Z test_empty_tensors (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5615198Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77605 2023-01-11T22:44:49.5615417Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77606 2023-01-11T22:44:49.5615633Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 77607 2023-01-11T22:44:49.5615829Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 77608 2023-01-11T22:44:49.5616305Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5616484Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5616866Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5617114Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5617499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5617677Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5618053Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5618242Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5618592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5618767Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5619138Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5619325Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5619694Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5619868Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5620247Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5620437Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5620675Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfzafe9xa 2023-01-11T22:44:49.5620950Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfzafe9xa/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5621182Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5621437Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmzw9b8d7 2023-01-11T22:44:49.5621711Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmzw9b8d7/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5621964Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc9typ5a0 2023-01-11T22:44:49.5622230Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc9typ5a0/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5622483Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv6tb3mm1 2023-01-11T22:44:49.5622746Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv6tb3mm1/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5622961Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5623190Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5623417Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5623521Z ok (4.122s) 2023-01-11T22:44:49.5623541Z 2023-01-11T22:44:49.5623819Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5623936Z Ran 1 test in 4.123s 2023-01-11T22:44:49.5623955Z 2023-01-11T22:44:49.5624051Z OK 2023-01-11T22:44:49.5624070Z 2023-01-11T22:44:49.5624195Z Generating XML reports... 2023-01-11T22:44:49.5624608Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224152.xml 2023-01-11T22:44:49.5624979Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5625234Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5625624Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5625817Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5626075Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7ue9h9rm 2023-01-11T22:44:49.5626389Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7ue9h9rm/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5626410Z 2023-01-11T22:44:49.5626530Z Running tests... 2023-01-11T22:44:49.5626800Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5627093Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5627338Z test_gather_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5627561Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77788 2023-01-11T22:44:49.5627778Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77789 2023-01-11T22:44:49.5627993Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 77790 2023-01-11T22:44:49.5628208Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 77791 2023-01-11T22:44:49.5628584Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5628762Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5629126Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5629322Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5629684Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5629865Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5630241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5630429Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5630791Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5630965Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5631343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5631514Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5631899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5632079Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5632453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5632643Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5632898Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqcymehw7 2023-01-11T22:44:49.5633174Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqcymehw7/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5633402Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5633637Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp46xay6xi 2023-01-11T22:44:49.5633909Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp46xay6xi/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5634201Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5634456Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqyrfx96s 2023-01-11T22:44:49.5634725Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqyrfx96s/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5634980Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf42psrd9 2023-01-11T22:44:49.5635293Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf42psrd9/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5635525Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5635753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5635837Z ok (4.143s) 2023-01-11T22:44:49.5635857Z 2023-01-11T22:44:49.5636136Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5636254Z Ran 1 test in 4.143s 2023-01-11T22:44:49.5636274Z 2023-01-11T22:44:49.5636368Z OK 2023-01-11T22:44:49.5636387Z 2023-01-11T22:44:49.5636512Z Generating XML reports... 2023-01-11T22:44:49.5636946Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224159.xml 2023-01-11T22:44:49.5637315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5637496Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5637858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5638052Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5638306Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpys2ispl5 2023-01-11T22:44:49.5638574Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpys2ispl5/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5638598Z 2023-01-11T22:44:49.5638708Z Running tests... 2023-01-11T22:44:49.5638977Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5639294Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5639547Z test_gather_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5639771Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77971 2023-01-11T22:44:49.5639971Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77972 2023-01-11T22:44:49.5640186Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 77973 2023-01-11T22:44:49.5640397Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 77974 2023-01-11T22:44:49.5640770Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5640953Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5641330Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5641521Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5641892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5642049Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5642421Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5642611Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5643082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5643368Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5643804Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5644047Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5644446Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5644713Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5645080Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5645430Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5645734Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsvupld4q 2023-01-11T22:44:49.5646043Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsvupld4q/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5646338Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpagbgol44 2023-01-11T22:44:49.5646591Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpagbgol44/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5646879Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4pde2y1z 2023-01-11T22:44:49.5647183Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4pde2y1z/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5647481Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk90xtux8 2023-01-11T22:44:49.5647780Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk90xtux8/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5648085Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5648352Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5648623Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5648831Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5648967Z ok (5.150s) 2023-01-11T22:44:49.5648987Z 2023-01-11T22:44:49.5649296Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5649492Z Ran 1 test in 5.150s 2023-01-11T22:44:49.5649512Z 2023-01-11T22:44:49.5649645Z OK 2023-01-11T22:44:49.5649665Z 2023-01-11T22:44:49.5649824Z Generating XML reports... 2023-01-11T22:44:49.5650331Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224205.xml 2023-01-11T22:44:49.5650740Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5650951Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5651320Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5651549Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5651849Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf6qfboqp 2023-01-11T22:44:49.5652154Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf6qfboqp/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5652178Z 2023-01-11T22:44:49.5652325Z Running tests... 2023-01-11T22:44:49.5652628Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5653218Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5653512Z test_gather_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5653712Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78158 2023-01-11T22:44:49.5654084Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78159 2023-01-11T22:44:49.5654337Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 78160 2023-01-11T22:44:49.5654627Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 78161 2023-01-11T22:44:49.5655056Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5655330Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5655772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5656047Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5656401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5656626Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5657042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5657265Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5657669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5657882Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5658291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5658512Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5658957Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5659119Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5659542Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5659762Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5660085Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa0cqn3t5 2023-01-11T22:44:49.5660389Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa0cqn3t5/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5660681Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphjjcb4cu 2023-01-11T22:44:49.5660988Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphjjcb4cu/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5661285Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpljk4a_g0 2023-01-11T22:44:49.5661534Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpljk4a_g0/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5661866Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7n7c7p22 2023-01-11T22:44:49.5662171Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7n7c7p22/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5662433Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5662690Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5681771Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5682066Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5682160Z ok (4.137s) 2023-01-11T22:44:49.5682183Z 2023-01-11T22:44:49.5682481Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5682584Z Ran 1 test in 4.138s 2023-01-11T22:44:49.5682604Z 2023-01-11T22:44:49.5682685Z OK 2023-01-11T22:44:49.5682705Z 2023-01-11T22:44:49.5682945Z Generating XML reports... 2023-01-11T22:44:49.5683399Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224213.xml 2023-01-11T22:44:49.5683771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5683949Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5684387Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5684589Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5684847Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2qx2g_6q 2023-01-11T22:44:49.5685116Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2qx2g_6q/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5685138Z 2023-01-11T22:44:49.5685230Z Running tests... 2023-01-11T22:44:49.5685508Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5685822Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5686092Z test_gather_noncontiguous_input (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5686314Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78341 2023-01-11T22:44:49.5686532Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78342 2023-01-11T22:44:49.5686747Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 78343 2023-01-11T22:44:49.5686962Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 78344 2023-01-11T22:44:49.5687336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5687496Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5687878Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5688069Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5688431Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5688607Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5688987Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5689177Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5689535Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5689690Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5690072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5690262Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5690622Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5690794Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5691172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5691363Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5691620Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzwhjhd4i 2023-01-11T22:44:49.5691891Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzwhjhd4i/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5692126Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiacg0n9e 2023-01-11T22:44:49.5692468Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiacg0n9e/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5692695Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5693185Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7w52cr40 2023-01-11T22:44:49.5693559Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7w52cr40/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5693820Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp96k6fgns 2023-01-11T22:44:49.5694080Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp96k6fgns/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5694305Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5694505Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5694738Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5694842Z ok (4.028s) 2023-01-11T22:44:49.5694862Z 2023-01-11T22:44:49.5695150Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5695264Z Ran 1 test in 4.028s 2023-01-11T22:44:49.5695284Z 2023-01-11T22:44:49.5695380Z OK 2023-01-11T22:44:49.5695399Z 2023-01-11T22:44:49.5695523Z Generating XML reports... 2023-01-11T22:44:49.5695966Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224219.xml 2023-01-11T22:44:49.5696338Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5696498Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5696878Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5697071Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5697325Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph1ch2lr7 2023-01-11T22:44:49.5697596Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph1ch2lr7/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5697615Z 2023-01-11T22:44:49.5697724Z Running tests... 2023-01-11T22:44:49.5697996Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5698308Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5698533Z test_gather_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5698747Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78524 2023-01-11T22:44:49.5698967Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78525 2023-01-11T22:44:49.5699185Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 78526 2023-01-11T22:44:49.5699399Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 78527 2023-01-11T22:44:49.5699770Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5699944Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5700324Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5700500Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5700861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5701033Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5701405Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5701699Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5702068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5702242Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5702660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5702859Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5703218Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5703390Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5703759Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5703951Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5704204Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph6y9xyvx 2023-01-11T22:44:49.5704474Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph6y9xyvx/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5704726Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvsk9kgv7 2023-01-11T22:44:49.5705000Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvsk9kgv7/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5705229Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5705440Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5705691Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr28pxwed 2023-01-11T22:44:49.5705957Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr28pxwed/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5706214Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm0kpbmjt 2023-01-11T22:44:49.5706478Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm0kpbmjt/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5706697Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5706925Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5707028Z ok (4.735s) 2023-01-11T22:44:49.5707048Z 2023-01-11T22:44:49.5707305Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5707419Z Ran 1 test in 4.736s 2023-01-11T22:44:49.5707438Z 2023-01-11T22:44:49.5707532Z OK 2023-01-11T22:44:49.5707551Z 2023-01-11T22:44:49.5707675Z Generating XML reports... 2023-01-11T22:44:49.5708115Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224226.xml 2023-01-11T22:44:49.5708487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5708664Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5709045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5709239Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5709475Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp34xge_7f 2023-01-11T22:44:49.5709742Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp34xge_7f/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5709761Z 2023-01-11T22:44:49.5709869Z Running tests... 2023-01-11T22:44:49.5710134Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5710446Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5710765Z test_gather_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5710985Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78731 2023-01-11T22:44:49.5711201Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78732 2023-01-11T22:44:49.5711400Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 78733 2023-01-11T22:44:49.5711658Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 78734 2023-01-11T22:44:49.5712055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5712231Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5712611Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5712810Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5713176Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5713349Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5713703Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5713896Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5714260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5714436Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5714805Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5714990Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5715354Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5715526Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5715903Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5716076Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5716331Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl49dhp8q 2023-01-11T22:44:49.5716602Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl49dhp8q/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5716851Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_4eny977 2023-01-11T22:44:49.5717113Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_4eny977/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5717369Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk_d1g04_ 2023-01-11T22:44:49.5717632Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk_d1g04_/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5717859Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5718069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5718324Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphmqihbn1 2023-01-11T22:44:49.5718589Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphmqihbn1/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5718813Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5719035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5719136Z ok (7.225s) 2023-01-11T22:44:49.5719213Z 2023-01-11T22:44:49.5719495Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5719610Z Ran 1 test in 7.225s 2023-01-11T22:44:49.5719629Z 2023-01-11T22:44:49.5719722Z OK 2023-01-11T22:44:49.5719741Z 2023-01-11T22:44:49.5719848Z Generating XML reports... 2023-01-11T22:44:49.5720282Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224233.xml 2023-01-11T22:44:49.5720705Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5720889Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5721273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5721466Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5721720Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfnvkw59u 2023-01-11T22:44:49.5721994Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfnvkw59u/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5722014Z 2023-01-11T22:44:49.5722123Z Running tests... 2023-01-11T22:44:49.5722372Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5722683Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5722945Z test_multi_device_constructor (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5723162Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78942 2023-01-11T22:44:49.5723381Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78943 2023-01-11T22:44:49.5723597Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 78944 2023-01-11T22:44:49.5723809Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 78945 2023-01-11T22:44:49.5724182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5724340Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5724719Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5724915Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5725281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5725454Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5725827Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5726014Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5726378Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5726533Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5726911Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5727098Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5727459Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5727630Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5728002Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5728190Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5728446Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkcuhbkaf 2023-01-11T22:44:49.5728781Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkcuhbkaf/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5729017Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpunuoa4qr 2023-01-11T22:44:49.5729283Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpunuoa4qr/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5729579Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt1_b74w_ 2023-01-11T22:44:49.5729850Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt1_b74w_/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5730079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5730306Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5730561Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpedfrppmr 2023-01-11T22:44:49.5730831Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpedfrppmr/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5731039Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5731261Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5731364Z ok (4.099s) 2023-01-11T22:44:49.5731384Z 2023-01-11T22:44:49.5731663Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5731773Z Ran 1 test in 4.099s 2023-01-11T22:44:49.5731793Z 2023-01-11T22:44:49.5731886Z OK 2023-01-11T22:44:49.5731905Z 2023-01-11T22:44:49.5732029Z Generating XML reports... 2023-01-11T22:44:49.5732495Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224242.xml 2023-01-11T22:44:49.5733066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5733249Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5733637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5733834Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5734089Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6zlrc0bt 2023-01-11T22:44:49.5734360Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6zlrc0bt/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5734381Z 2023-01-11T22:44:49.5734489Z Running tests... 2023-01-11T22:44:49.5734758Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5735069Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5735295Z test_reduce_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5735516Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79129 2023-01-11T22:44:49.5735732Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79130 2023-01-11T22:44:49.5735946Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79131 2023-01-11T22:44:49.5736156Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79132 2023-01-11T22:44:49.5736530Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5736706Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5737085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5737273Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5737622Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5737889Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5738273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5738462Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5738916Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5739102Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5739478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5739666Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5740017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5740194Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5740562Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5740748Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5741001Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9og02yiz 2023-01-11T22:44:49.5741271Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9og02yiz/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5741500Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5741753Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7ixz4rim 2023-01-11T22:44:49.5742017Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7ixz4rim/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5742253Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptoauorlv 2023-01-11T22:44:49.5742517Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptoauorlv/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5742742Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5742966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5743222Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmg8e02r7 2023-01-11T22:44:49.5743486Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmg8e02r7/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5743707Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5743810Z ok (4.129s) 2023-01-11T22:44:49.5743830Z 2023-01-11T22:44:49.5744088Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5744207Z Ran 1 test in 4.129s 2023-01-11T22:44:49.5744226Z 2023-01-11T22:44:49.5744320Z OK 2023-01-11T22:44:49.5744339Z 2023-01-11T22:44:49.5744463Z Generating XML reports... 2023-01-11T22:44:49.5744900Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224249.xml 2023-01-11T22:44:49.5745267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5745446Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5745827Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5746017Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5746255Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppqx7pizq 2023-01-11T22:44:49.5746525Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppqx7pizq/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5746604Z 2023-01-11T22:44:49.5746719Z Running tests... 2023-01-11T22:44:49.5746993Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5747307Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5747560Z test_reduce_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5747829Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79312 2023-01-11T22:44:49.5748058Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79313 2023-01-11T22:44:49.5748254Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79314 2023-01-11T22:44:49.5748470Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79315 2023-01-11T22:44:49.5748850Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5749032Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5749410Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5749601Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5749971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5750144Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5750516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5750687Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5751047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5751226Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5751595Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5751780Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5752137Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5752314Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5752694Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5752862Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5753117Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbif6g_ww 2023-01-11T22:44:49.5753386Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbif6g_ww/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5753639Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpco_bndk5 2023-01-11T22:44:49.5753904Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpco_bndk5/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5754156Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz6a9yi9h 2023-01-11T22:44:49.5754420Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz6a9yi9h/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5754647Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5754873Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5755082Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5755333Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp72493gu1 2023-01-11T22:44:49.5755664Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp72493gu1/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5755891Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5755995Z ok (5.131s) 2023-01-11T22:44:49.5756015Z 2023-01-11T22:44:49.5756288Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5756402Z Ran 1 test in 5.132s 2023-01-11T22:44:49.5756468Z 2023-01-11T22:44:49.5756568Z OK 2023-01-11T22:44:49.5756587Z 2023-01-11T22:44:49.5756695Z Generating XML reports... 2023-01-11T22:44:49.5757138Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224255.xml 2023-01-11T22:44:49.5757512Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5757687Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5758067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5758256Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5758508Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm3u9ffzd 2023-01-11T22:44:49.5758774Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm3u9ffzd/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5758797Z 2023-01-11T22:44:49.5758906Z Running tests... 2023-01-11T22:44:49.5759152Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5759463Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5759707Z test_reduce_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5759923Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79499 2023-01-11T22:44:49.5760143Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79500 2023-01-11T22:44:49.5760356Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79501 2023-01-11T22:44:49.5760569Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79502 2023-01-11T22:44:49.5760941Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5761102Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5761486Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5761677Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5762042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5762216Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5762595Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5762783Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5763143Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5763318Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5763679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5763865Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5764224Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5764394Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5764856Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5765045Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5765304Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7ygmj5e8 2023-01-11T22:44:49.5765573Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7ygmj5e8/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5765858Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprpfi7ayo 2023-01-11T22:44:49.5766138Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprpfi7ayo/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5766370Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5766595Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5766850Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7p9rm0up 2023-01-11T22:44:49.5767122Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7p9rm0up/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5767374Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq4e171m5 2023-01-11T22:44:49.5767636Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq4e171m5/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5767863Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5768071Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5768173Z ok (4.122s) 2023-01-11T22:44:49.5768194Z 2023-01-11T22:44:49.5768471Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5768582Z Ran 1 test in 4.122s 2023-01-11T22:44:49.5768602Z 2023-01-11T22:44:49.5768692Z OK 2023-01-11T22:44:49.5768711Z 2023-01-11T22:44:49.5768839Z Generating XML reports... 2023-01-11T22:44:49.5769270Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224303.xml 2023-01-11T22:44:49.5769637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5769796Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5770179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5770372Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5770621Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_yi19q1l 2023-01-11T22:44:49.5770884Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_yi19q1l/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5770904Z 2023-01-11T22:44:49.5771010Z Running tests... 2023-01-11T22:44:49.5771284Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5771595Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5771834Z test_reduce_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5772035Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79682 2023-01-11T22:44:49.5772251Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79683 2023-01-11T22:44:49.5772464Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79684 2023-01-11T22:44:49.5772677Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79685 2023-01-11T22:44:49.5773258Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5773441Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5773931Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5774123Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5774466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5774639Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5775073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5775271Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5775637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5775811Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5776181Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5776372Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5776724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5776896Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5777266Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5777453Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5777709Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa2hvfe60 2023-01-11T22:44:49.5777978Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa2hvfe60/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5778230Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp76cguo0y 2023-01-11T22:44:49.5778501Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp76cguo0y/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5778753Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxtzemdmh 2023-01-11T22:44:49.5779003Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxtzemdmh/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5779255Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6gqgy991 2023-01-11T22:44:49.5779517Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6gqgy991/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5779745Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5779972Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5780199Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5780430Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5780533Z ok (4.466s) 2023-01-11T22:44:49.5780553Z 2023-01-11T22:44:49.5780824Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5780921Z Ran 1 test in 4.466s 2023-01-11T22:44:49.5780940Z 2023-01-11T22:44:49.5781032Z OK 2023-01-11T22:44:49.5781051Z 2023-01-11T22:44:49.5781175Z Generating XML reports... 2023-01-11T22:44:49.5781614Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224309.xml 2023-01-11T22:44:49.5781984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5782160Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5782541Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5782796Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5783031Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphbr85u3v 2023-01-11T22:44:49.5783298Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphbr85u3v/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5783318Z 2023-01-11T22:44:49.5783427Z Running tests... 2023-01-11T22:44:49.5783745Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5784071Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5784321Z test_reduce_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5784543Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79889 2023-01-11T22:44:49.5784760Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79890 2023-01-11T22:44:49.5784957Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79891 2023-01-11T22:44:49.5785177Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79892 2023-01-11T22:44:49.5785549Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5785725Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5786106Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5786298Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5786666Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5786839Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5787214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5787387Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5787751Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5787926Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5788300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5788486Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5788847Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5789018Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5789399Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5789570Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5789825Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpky5_iqre 2023-01-11T22:44:49.5790094Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpky5_iqre/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5790346Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj_407byd 2023-01-11T22:44:49.5790612Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj_407byd/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5790843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5791074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5791328Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpext1pt2m 2023-01-11T22:44:49.5791594Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpext1pt2m/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5791892Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp72dr3g4e 2023-01-11T22:44:49.5792156Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp72dr3g4e/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5792381Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5792654Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5792764Z ok (5.830s) 2023-01-11T22:44:49.5792784Z 2023-01-11T22:44:49.5793065Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5793179Z Ran 1 test in 5.830s 2023-01-11T22:44:49.5793199Z 2023-01-11T22:44:49.5793294Z OK 2023-01-11T22:44:49.5793314Z 2023-01-11T22:44:49.5793421Z Generating XML reports... 2023-01-11T22:44:49.5793857Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224316.xml 2023-01-11T22:44:49.5794230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5794407Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5794787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5794981Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5795235Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9vzxs4y3 2023-01-11T22:44:49.5795501Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9vzxs4y3/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5795521Z 2023-01-11T22:44:49.5795629Z Running tests... 2023-01-11T22:44:49.5795876Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5796186Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5796427Z test_round_robin (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5796645Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80100 2023-01-11T22:44:49.5796861Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80101 2023-01-11T22:44:49.5797076Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80102 2023-01-11T22:44:49.5797292Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80103 2023-01-11T22:44:49.5797661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5797819Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5798191Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5798372Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5798742Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5798915Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5799294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5799487Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5799863Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5800050Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5800408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5800596Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5801035Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5801213Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5801584Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5801816Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5802081Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5uqzlpiv 2023-01-11T22:44:49.5802353Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5uqzlpiv/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5802560Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5802816Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyobt9ig2 2023-01-11T22:44:49.5803090Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyobt9ig2/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5803341Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4gd37vwa 2023-01-11T22:44:49.5803601Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4gd37vwa/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5803853Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpou6nswvk 2023-01-11T22:44:49.5804120Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpou6nswvk/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5804346Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5804571Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5804776Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5805023Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5805273Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5805513Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:44:49.5805752Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:44:49.5806169Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5806570Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5806959Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5807342Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5807566Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:44:49.5807804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 2 2023-01-11T22:44:49.5808044Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:44:49.5808286Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 3 2023-01-11T22:44:49.5808682Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:44:49.5809072Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:44:49.5809459Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:44:49.5809919Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:44:49.5810162Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:44:49.5810383Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:44:49.5810665Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 2 2023-01-11T22:44:49.5810907Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 3 2023-01-11T22:44:49.5811307Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:44:49.5811697Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:44:49.5812083Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:44:49.5812651Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:44:49.5813401Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:44:49.5813955Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:44:49.5814369Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:44:49.5814905Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:44:49.5815012Z ok (4.220s) 2023-01-11T22:44:49.5815032Z 2023-01-11T22:44:49.5815301Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5815398Z Ran 1 test in 4.220s 2023-01-11T22:44:49.5815418Z 2023-01-11T22:44:49.5815512Z OK 2023-01-11T22:44:49.5815531Z 2023-01-11T22:44:49.5815656Z Generating XML reports... 2023-01-11T22:44:49.5816094Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224324.xml 2023-01-11T22:44:49.5816471Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5816648Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5817031Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5817226Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5817478Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnu4_jfy8 2023-01-11T22:44:49.5817728Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnu4_jfy8/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5817748Z 2023-01-11T22:44:49.5817855Z Running tests... 2023-01-11T22:44:49.5818124Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5818437Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5818798Z test_round_robin_create_destroy (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5819018Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80315 2023-01-11T22:44:49.5819234Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80316 2023-01-11T22:44:49.5819510Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80317 2023-01-11T22:44:49.5819715Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80318 2023-01-11T22:44:49.5820099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5820278Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5820660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5820854Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5821221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5821396Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5821770Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5821944Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5822313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5822485Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5822852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5823041Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5823400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5823574Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5823944Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5824134Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5824372Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoefs23v7 2023-01-11T22:44:49.5824639Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoefs23v7/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5824890Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvfxi0lwq 2023-01-11T22:44:49.5825156Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvfxi0lwq/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5825387Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5825614Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5825870Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfx2wm98f 2023-01-11T22:44:49.5826136Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfx2wm98f/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5826385Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_d4_zz2a 2023-01-11T22:44:49.5826631Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_d4_zz2a/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5826857Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5827084Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5827416Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5827660Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:44:49.5827901Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2023-01-11T22:44:49.5828137Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2023-01-11T22:44:49.5828604Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5828994Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5829390Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5829786Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2023-01-11T22:44:49.5830030Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:44:49.5830271Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:44:49.5830511Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 2 2023-01-11T22:44:49.5830746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 3 2023-01-11T22:44:49.5831140Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:44:49.5831530Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:44:49.5831916Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:44:49.5832290Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2023-01-11T22:44:49.5832527Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:44:49.5832765Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 3 2023-01-11T22:44:49.5833034Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:44:49.5833269Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 2 2023-01-11T22:44:49.5833666Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:44:49.5834058Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:44:49.5834619Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:44:49.5835165Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:44:49.5835567Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:44:49.5835956Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes. 2023-01-11T22:44:49.5836499Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:44:49.5837131Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:44:49.5837380Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:44:49.5837625Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:44:49.5837867Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 3 2023-01-11T22:44:49.5838103Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 2 2023-01-11T22:44:49.5838516Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:4 with 4 nodes. 2023-01-11T22:44:49.5838909Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:4 with 4 nodes. 2023-01-11T22:44:49.5839303Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 4 nodes. 2023-01-11T22:44:49.5839693Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 4 nodes. 2023-01-11T22:44:49.5839935Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:44:49.5840153Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:44:49.5840393Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 3 2023-01-11T22:44:49.5840630Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 2 2023-01-11T22:44:49.5841020Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 4 nodes. 2023-01-11T22:44:49.5841409Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 4 nodes. 2023-01-11T22:44:49.5841795Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:5 with 4 nodes. 2023-01-11T22:44:49.5842183Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:5 with 4 nodes. 2023-01-11T22:44:49.5842731Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:44:49.5843278Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:44:49.5843821Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:44:49.5844348Z [W ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) 2023-01-11T22:44:49.5844517Z ok (4.528s) 2023-01-11T22:44:49.5844538Z 2023-01-11T22:44:49.5844817Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5844914Z Ran 1 test in 4.528s 2023-01-11T22:44:49.5844933Z 2023-01-11T22:44:49.5845031Z OK 2023-01-11T22:44:49.5845050Z 2023-01-11T22:44:49.5845176Z Generating XML reports... 2023-01-11T22:44:49.5845658Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224330.xml 2023-01-11T22:44:49.5846044Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5846223Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5846607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5846805Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5847045Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6cledy9i 2023-01-11T22:44:49.5847316Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6cledy9i/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5847336Z 2023-01-11T22:44:49.5847447Z Running tests... 2023-01-11T22:44:49.5847716Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5848030Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5848274Z test_scatter_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5848493Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80562 2023-01-11T22:44:49.5848709Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80563 2023-01-11T22:44:49.5848924Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80564 2023-01-11T22:44:49.5849125Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80565 2023-01-11T22:44:49.5849499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5849676Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5850062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5850255Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5850622Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5850796Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5851170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5851345Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5851705Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5851878Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5852255Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5852446Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5852804Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5853124Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5853510Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5853703Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5854042Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi8v7pas5 2023-01-11T22:44:49.5854315Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi8v7pas5/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5854546Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5854858Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbfj88qqd 2023-01-11T22:44:49.5855136Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbfj88qqd/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5855368Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5855622Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa2aepmy7 2023-01-11T22:44:49.5855888Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa2aepmy7/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5856100Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5856353Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbwvxbsfz 2023-01-11T22:44:49.5856619Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbwvxbsfz/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5856847Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5856952Z ok (4.134s) 2023-01-11T22:44:49.5856972Z 2023-01-11T22:44:49.5857256Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5857371Z Ran 1 test in 4.134s 2023-01-11T22:44:49.5857391Z 2023-01-11T22:44:49.5857487Z OK 2023-01-11T22:44:49.5857506Z 2023-01-11T22:44:49.5857630Z Generating XML reports... 2023-01-11T22:44:49.5858047Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224337.xml 2023-01-11T22:44:49.5858421Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5858598Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5858977Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5859169Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5859421Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7dvubt0v 2023-01-11T22:44:49.5859685Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7dvubt0v/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5859705Z 2023-01-11T22:44:49.5859814Z Running tests... 2023-01-11T22:44:49.5860064Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5860380Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5860637Z test_scatter_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5860855Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80745 2023-01-11T22:44:49.5861072Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80746 2023-01-11T22:44:49.5861285Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80747 2023-01-11T22:44:49.5861498Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80748 2023-01-11T22:44:49.5861874Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5862048Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5862413Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5862604Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5863050Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5863224Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5863599Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5863837Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5864211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5864386Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5864738Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5864925Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5865297Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5865470Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5865840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5866025Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5866285Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcpwv56cz 2023-01-11T22:44:49.5866557Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcpwv56cz/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5866808Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzhhsnf92 2023-01-11T22:44:49.5867056Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzhhsnf92/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5867286Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5867537Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwqdb52k4 2023-01-11T22:44:49.5867799Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwqdb52k4/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5868024Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5868278Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6bdatkic 2023-01-11T22:44:49.5868545Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6bdatkic/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5868772Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5868976Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5869080Z ok (5.025s) 2023-01-11T22:44:49.5869103Z 2023-01-11T22:44:49.5869377Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5869488Z Ran 1 test in 5.025s 2023-01-11T22:44:49.5869507Z 2023-01-11T22:44:49.5869598Z OK 2023-01-11T22:44:49.5869617Z 2023-01-11T22:44:49.5869741Z Generating XML reports... 2023-01-11T22:44:49.5870175Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224344.xml 2023-01-11T22:44:49.5870550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5870726Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5871089Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5871281Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5871535Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp05z64awi 2023-01-11T22:44:49.5871870Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp05z64awi/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5871891Z 2023-01-11T22:44:49.5872001Z Running tests... 2023-01-11T22:44:49.5872277Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5872592Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5872884Z test_scatter_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5873092Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80932 2023-01-11T22:44:49.5873310Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80933 2023-01-11T22:44:49.5873526Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80934 2023-01-11T22:44:49.5873740Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80935 2023-01-11T22:44:49.5874128Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5874305Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5874684Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5874876Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5875249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5875408Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5875781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5875970Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5876333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5876512Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5876883Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5877069Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5877432Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5877587Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5877968Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5878155Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5878409Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3tl_yyzj 2023-01-11T22:44:49.5878683Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3tl_yyzj/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5878937Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeikho3rz 2023-01-11T22:44:49.5879206Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeikho3rz/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5879435Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5879662Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5879900Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpofkn_vb0 2023-01-11T22:44:49.5880168Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpofkn_vb0/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5880423Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfi4que23 2023-01-11T22:44:49.5880793Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfi4que23/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5881019Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5881248Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5881351Z ok (4.136s) 2023-01-11T22:44:49.5881372Z 2023-01-11T22:44:49.5881698Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5881801Z Ran 1 test in 4.137s 2023-01-11T22:44:49.5881821Z 2023-01-11T22:44:49.5881916Z OK 2023-01-11T22:44:49.5881935Z 2023-01-11T22:44:49.5882058Z Generating XML reports... 2023-01-11T22:44:49.5882503Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224351.xml 2023-01-11T22:44:49.5882875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5883056Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5883431Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5883621Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5883873Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1wuz4o23 2023-01-11T22:44:49.5884125Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1wuz4o23/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5884145Z 2023-01-11T22:44:49.5884255Z Running tests... 2023-01-11T22:44:49.5884523Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5884834Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5885075Z test_scatter_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5885298Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81115 2023-01-11T22:44:49.5885518Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81116 2023-01-11T22:44:49.5885737Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81117 2023-01-11T22:44:49.5885934Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81118 2023-01-11T22:44:49.5886313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5886491Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5886874Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5887066Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5887427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5887605Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5887976Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5888145Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5888507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5888680Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5889056Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5889243Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5889602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5889851Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5890232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5890423Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5890659Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4pad7gzj 2023-01-11T22:44:49.5890978Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4pad7gzj/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5891246Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu639jckw 2023-01-11T22:44:49.5891499Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdl_7m93u 2023-01-11T22:44:49.5891747Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4mx9hl4u 2023-01-11T22:44:49.5892014Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu639jckw/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5892274Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdl_7m93u/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5892527Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4mx9hl4u/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5892756Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5893135Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5893367Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5893593Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5893699Z ok (4.820s) 2023-01-11T22:44:49.5893719Z 2023-01-11T22:44:49.5894000Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5894115Z Ran 1 test in 4.820s 2023-01-11T22:44:49.5894139Z 2023-01-11T22:44:49.5894234Z OK 2023-01-11T22:44:49.5894253Z 2023-01-11T22:44:49.5894377Z Generating XML reports... 2023-01-11T22:44:49.5894796Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224357.xml 2023-01-11T22:44:49.5895166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5895347Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5895727Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5895919Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5896173Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7qqewoh5 2023-01-11T22:44:49.5896440Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7qqewoh5/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5896463Z 2023-01-11T22:44:49.5896573Z Running tests... 2023-01-11T22:44:49.5896821Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5897133Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5897445Z test_scatter_stress_cuda (__main__.ProcessGroupGlooTest) ... skip: Test is flaky, see https://github.com/pytorch/pytorch/issues/15963 (0.001s) 2023-01-11T22:44:49.5897467Z 2023-01-11T22:44:49.5897729Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5897842Z Ran 1 test in 0.001s 2023-01-11T22:44:49.5897861Z 2023-01-11T22:44:49.5897969Z OK (skipped=1) 2023-01-11T22:44:49.5897988Z 2023-01-11T22:44:49.5898111Z Generating XML reports... 2023-01-11T22:44:49.5898537Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224405.xml 2023-01-11T22:44:49.5898906Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5899159Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5899549Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5899742Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5900060Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi60lfo5c 2023-01-11T22:44:49.5900341Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi60lfo5c/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5900361Z 2023-01-11T22:44:49.5900473Z Running tests... 2023-01-11T22:44:49.5900746Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5901058Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5901312Z test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5901518Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81355 2023-01-11T22:44:49.5901736Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81356 2023-01-11T22:44:49.5901951Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81357 2023-01-11T22:44:49.5902168Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81358 2023-01-11T22:44:49.5902544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5902718Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5903098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5903290Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5903639Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5903815Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5904189Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5904376Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5904739Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5904912Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5905290Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5905475Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5905833Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5905991Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5906362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5906551Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5906808Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp87hsonr9 2023-01-11T22:44:49.5907075Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp87hsonr9/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5907327Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbrud2ulw 2023-01-11T22:44:49.5907594Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbrud2ulw/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5907846Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuqvj34wu 2023-01-11T22:44:49.5908163Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuqvj34wu/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5908418Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd0j0n5sw 2023-01-11T22:44:49.5908685Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd0j0n5sw/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5908956Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5909191Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5909411Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5909638Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5909743Z ok (4.157s) 2023-01-11T22:44:49.5909763Z 2023-01-11T22:44:49.5910043Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5910143Z Ran 1 test in 4.157s 2023-01-11T22:44:49.5910161Z 2023-01-11T22:44:49.5910259Z OK 2023-01-11T22:44:49.5910279Z 2023-01-11T22:44:49.5910404Z Generating XML reports... 2023-01-11T22:44:49.5910844Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224407.xml 2023-01-11T22:44:49.5911219Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5911396Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5911775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5911966Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5912202Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_rjpoukk 2023-01-11T22:44:49.5912477Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_rjpoukk/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5912497Z 2023-01-11T22:44:49.5912606Z Running tests... 2023-01-11T22:44:49.5912873Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5913185Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5913466Z test_sparse_allreduce_basics (__main__.ProcessGroupGlooTest) ... skip: intermittent failures on Windows, in CI (0.000s) 2023-01-11T22:44:49.5913486Z 2023-01-11T22:44:49.5913748Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5913861Z Ran 1 test in 0.000s 2023-01-11T22:44:49.5913880Z 2023-01-11T22:44:49.5913989Z OK (skipped=1) 2023-01-11T22:44:49.5914008Z 2023-01-11T22:44:49.5914115Z Generating XML reports... 2023-01-11T22:44:49.5914547Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224413.xml 2023-01-11T22:44:49.5914924Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5915101Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5915483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5915676Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5915929Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqb_lnof9 2023-01-11T22:44:49.5916196Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqb_lnof9/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5916215Z 2023-01-11T22:44:49.5916324Z Running tests... 2023-01-11T22:44:49.5916571Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5916882Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5917220Z test_sparse_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5917440Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81571 2023-01-11T22:44:49.5917660Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81572 2023-01-11T22:44:49.5917876Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81573 2023-01-11T22:44:49.5918133Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81574 2023-01-11T22:44:49.5918523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5918680Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5919060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5919256Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5919626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5919801Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5920172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5920363Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5920725Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5920899Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5921250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5921437Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5921803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5921974Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5922350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5922540Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5922796Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpya3c3mxr 2023-01-11T22:44:49.5923063Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpya3c3mxr/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5923301Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxp0m006_ 2023-01-11T22:44:49.5923566Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxp0m006_/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5923822Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzm47v6ax 2023-01-11T22:44:49.5924088Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzm47v6ax/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5924339Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm4v6pygf 2023-01-11T22:44:49.5924603Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm4v6pygf/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5924833Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5925056Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5925282Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5925490Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5925591Z ok (5.464s) 2023-01-11T22:44:49.5925668Z 2023-01-11T22:44:49.5925958Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5926074Z Ran 1 test in 5.464s 2023-01-11T22:44:49.5926093Z 2023-01-11T22:44:49.5926186Z OK 2023-01-11T22:44:49.5926204Z 2023-01-11T22:44:49.5926327Z Generating XML reports... 2023-01-11T22:44:49.5926760Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224415.xml 2023-01-11T22:44:49.5927181Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5927346Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5927731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5927922Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5928175Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj54ndxr0 2023-01-11T22:44:49.5928445Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj54ndxr0/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5928465Z 2023-01-11T22:44:49.5928575Z Running tests... 2023-01-11T22:44:49.5928839Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5929152Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5929417Z test_sparse_allreduce_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5929618Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81938 2023-01-11T22:44:49.5929836Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81939 2023-01-11T22:44:49.5930050Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81940 2023-01-11T22:44:49.5930264Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81941 2023-01-11T22:44:49.5930641Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5930816Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5931196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5931391Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5931740Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5931914Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5932288Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5932475Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5932838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5933204Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5933616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5933806Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5934171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5934327Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5934706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5934896Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5935151Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa43tet1q 2023-01-11T22:44:49.5935514Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa43tet1q/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5935770Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv5bilxaj 2023-01-11T22:44:49.5936041Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv5bilxaj/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5936349Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp07l1hs_m 2023-01-11T22:44:49.5936608Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp07l1hs_m/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5936839Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:44:49.5937070Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2023-01-11T22:44:49.5937297Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:44:49.5937554Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg4vzvrjt 2023-01-11T22:44:49.5937819Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg4vzvrjt/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5938041Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2023-01-11T22:44:49.5938143Z ok (4.141s) 2023-01-11T22:44:49.5938163Z 2023-01-11T22:44:49.5938449Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5938546Z Ran 1 test in 4.141s 2023-01-11T22:44:49.5938565Z 2023-01-11T22:44:49.5938661Z OK 2023-01-11T22:44:49.5938680Z 2023-01-11T22:44:49.5938804Z Generating XML reports... 2023-01-11T22:44:49.5939239Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224423.xml 2023-01-11T22:44:49.5939610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5939793Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5940174Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5940365Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5940602Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwh0izyxh 2023-01-11T22:44:49.5940870Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwh0izyxh/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5940891Z 2023-01-11T22:44:49.5941000Z Running tests... 2023-01-11T22:44:49.5941268Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5941580Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5941890Z test_forward_backward (__main__.ReducerTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5942300Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:44:49.5942405Z ok (0.007s) 2023-01-11T22:44:49.5942425Z 2023-01-11T22:44:49.5942688Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5942782Z Ran 1 test in 0.012s 2023-01-11T22:44:49.5942801Z 2023-01-11T22:44:49.5942898Z OK 2023-01-11T22:44:49.5942917Z 2023-01-11T22:44:49.5943044Z Generating XML reports... 2023-01-11T22:44:49.5943437Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224430.xml 2023-01-11T22:44:49.5943806Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5943983Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5944364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5944636Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5944875Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvkaurrs4 2023-01-11T22:44:49.5945145Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvkaurrs4/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5945165Z 2023-01-11T22:44:49.5945320Z Running tests... 2023-01-11T22:44:49.5945606Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5945920Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5946246Z test_forward_backward_optimizer (__main__.ReducerTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5946645Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:44:49.5947434Z [W reducer.cpp:1310] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-01-11T22:44:49.5947541Z ok (0.011s) 2023-01-11T22:44:49.5947560Z 2023-01-11T22:44:49.5947823Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5947936Z Ran 1 test in 0.022s 2023-01-11T22:44:49.5947955Z 2023-01-11T22:44:49.5948031Z OK 2023-01-11T22:44:49.5948049Z 2023-01-11T22:44:49.5948174Z Generating XML reports... 2023-01-11T22:44:49.5948571Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224432.xml 2023-01-11T22:44:49.5948942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5949118Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5949499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5949692Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5949948Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1q84dr7g 2023-01-11T22:44:49.5950198Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1q84dr7g/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5950235Z 2023-01-11T22:44:49.5950327Z Running tests... 2023-01-11T22:44:49.5950595Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5950911Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5951248Z test_forward_backward_unused_parameters (__main__.ReducerTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5951646Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:44:49.5951753Z ok (0.007s) 2023-01-11T22:44:49.5951772Z 2023-01-11T22:44:49.5952036Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5952150Z Ran 1 test in 0.012s 2023-01-11T22:44:49.5952169Z 2023-01-11T22:44:49.5952244Z OK 2023-01-11T22:44:49.5952280Z 2023-01-11T22:44:49.5952390Z Generating XML reports... 2023-01-11T22:44:49.5952786Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224434.xml 2023-01-11T22:44:49.5953246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5953424Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5953803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5953997Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5954302Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw89paw4e 2023-01-11T22:44:49.5954580Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw89paw4e/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5954600Z 2023-01-11T22:44:49.5954691Z Running tests... 2023-01-11T22:44:49.5954966Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5955277Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5955601Z test_multi_dtype_multi_bucket (__main__.ReducerTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5955998Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:44:49.5956100Z ok (0.004s) 2023-01-11T22:44:49.5956120Z 2023-01-11T22:44:49.5956380Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5956496Z Ran 1 test in 0.012s 2023-01-11T22:44:49.5956516Z 2023-01-11T22:44:49.5956592Z OK 2023-01-11T22:44:49.5956628Z 2023-01-11T22:44:49.5956734Z Generating XML reports... 2023-01-11T22:44:49.5957128Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224436.xml 2023-01-11T22:44:49.5957499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5957675Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5958057Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5958247Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5958498Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_9n530sh 2023-01-11T22:44:49.5958764Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_9n530sh/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5958783Z 2023-01-11T22:44:49.5958874Z Running tests... 2023-01-11T22:44:49.5959141Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5959453Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5959773Z test_multi_dtype_single_bucket (__main__.ReducerTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5960173Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:44:49.5960277Z ok (0.007s) 2023-01-11T22:44:49.5960296Z 2023-01-11T22:44:49.5960555Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5960669Z Ran 1 test in 0.011s 2023-01-11T22:44:49.5960688Z 2023-01-11T22:44:49.5960782Z OK 2023-01-11T22:44:49.5960800Z 2023-01-11T22:44:49.5960910Z Generating XML reports... 2023-01-11T22:44:49.5961307Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224438.xml 2023-01-11T22:44:49.5961679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5961855Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5962233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5962491Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5962748Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcrd0vgl2 2023-01-11T22:44:49.5963021Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcrd0vgl2/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5963041Z 2023-01-11T22:44:49.5963151Z Running tests... 2023-01-11T22:44:49.5963456Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5963782Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5964108Z test_single_dtype_single_bucket (__main__.ReducerTest) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5964506Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:44:49.5964617Z ok (0.004s) 2023-01-11T22:44:49.5964636Z 2023-01-11T22:44:49.5964896Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5965008Z Ran 1 test in 0.012s 2023-01-11T22:44:49.5965027Z 2023-01-11T22:44:49.5965120Z OK 2023-01-11T22:44:49.5965139Z 2023-01-11T22:44:49.5965244Z Generating XML reports... 2023-01-11T22:44:49.5965683Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224441.xml 2023-01-11T22:44:49.5966058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5966238Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5966620Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5966813Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5967073Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpejf_67c0 2023-01-11T22:44:49.5967339Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpejf_67c0/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5967359Z 2023-01-11T22:44:49.5967467Z Running tests... 2023-01-11T22:44:49.5967716Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5968029Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5968258Z test_logging_init (__main__.RendezvousEnvTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5968506Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:44:49.5968909Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-01-11T22:44:49.5969011Z ok (1.657s) 2023-01-11T22:44:49.5969030Z 2023-01-11T22:44:49.5969293Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5969409Z Ran 1 test in 1.657s 2023-01-11T22:44:49.5969429Z 2023-01-11T22:44:49.5969505Z OK 2023-01-11T22:44:49.5969523Z 2023-01-11T22:44:49.5969648Z Generating XML reports... 2023-01-11T22:44:49.5970065Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-RendezvousEnvTest-20230111224443.xml 2023-01-11T22:44:49.5970437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:44:49.5970614Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:44:49.5970995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:44:49.5971187Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:44:49.5971444Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5x5_blxl 2023-01-11T22:44:49.5971786Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5x5_blxl/_remote_module_non_scriptable.py 2023-01-11T22:44:49.5971806Z 2023-01-11T22:44:49.5971898Z Running tests... 2023-01-11T22:44:49.5972172Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5972484Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2023-01-11T22:44:49.5972771Z test_default_store_timeout_gloo (__main__.TimeoutTest) ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:44:49.5973693Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/74714 for allplatform(s) . If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.623s) 2023-01-11T22:44:49.5973715Z 2023-01-11T22:44:49.5973980Z ---------------------------------------------------------------------- 2023-01-11T22:44:49.5974100Z Ran 1 test in 1.623s 2023-01-11T22:44:49.5974119Z 2023-01-11T22:44:49.5974229Z OK (skipped=1) 2023-01-11T22:44:49.5974248Z 2023-01-11T22:44:49.5974372Z Generating XML reports... 2023-01-11T22:44:49.5974749Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-TimeoutTest-20230111224447.xml 2023-01-11T22:44:49.5974787Z 2023-01-11T22:44:49.5975188Z ##[endgroup] 2023-01-11T22:44:49.5975626Z FINISHED PRINTING LOG FILE of distributed/test_c10d_gloo (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_gloo_6tcbezuo) 2023-01-11T22:44:49.5975647Z 2023-01-11T22:44:49.5975911Z Running distributed/fsdp/test_fsdp_core ... [2023-01-11 22:44:49.426628] 2023-01-11T22:44:49.5976379Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_core.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 22:44:49.426981] 2023-01-11T22:54:20.9884784Z 2023-01-11T22:54:20.9887378Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_core 2023-01-11T22:54:20.9888341Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_core (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_core_25gv4ax7) 2023-01-11T22:54:20.9934023Z 2023-01-11T22:54:20.9934390Z Running tests... 2023-01-11T22:54:20.9935306Z ---------------------------------------------------------------------- 2023-01-11T22:54:20.9939541Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_core 2023-01-11T22:54:20.9940203Z test_pre_backward_hook_registration_after_state_dict (__main__.TestHooks) 2023-01-11T22:54:20.9941137Z Tests that FSDP pre-backward hooks are registered on forward pass ... INFO:numba.cuda.cudadrv.driver:init 2023-01-11T22:54:20.9941805Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82461 2023-01-11T22:54:20.9942446Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82462 2023-01-11T22:54:20.9943160Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:20.9945603Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:20.9946323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:20.9946821Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:20.9947417Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:20.9947858Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:20.9948446Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:20.9948921Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:20.9949379Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:20.9950192Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:20.9950872Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:20.9951565Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:20.9952208Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:20.9952687Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:20.9953973Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:20.9954875Z warnings.warn( 2023-01-11T22:54:20.9956047Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:20.9956821Z warnings.warn( 2023-01-11T22:54:20.9957076Z dist init r=1, world=2 2023-01-11T22:54:20.9957312Z dist init r=0, world=2 2023-01-11T22:54:20.9957557Z ok (5.657s) 2023-01-11T22:54:20.9957897Z test_pre_backward_hook_registration_cuda_first_False (__main__.TestHooks) 2023-01-11T22:54:20.9958558Z Tests that FSDP pre-backward hooks are registered on forward pass ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82544 2023-01-11T22:54:20.9959113Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82545 2023-01-11T22:54:20.9959749Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:20.9960207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:20.9960770Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:20.9961246Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:20.9961839Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:20.9962290Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:20.9962851Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:20.9963327Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:20.9963787Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:20.9964265Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:20.9964934Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:20.9965624Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:20.9966153Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:20.9966607Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:20.9967970Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:20.9968807Z warnings.warn( 2023-01-11T22:54:20.9969973Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:20.9970834Z warnings.warn( 2023-01-11T22:54:20.9971089Z dist init r=0, world=2 2023-01-11T22:54:20.9971346Z dist init r=1, world=2 2023-01-11T22:54:20.9971570Z ok (4.012s) 2023-01-11T22:54:20.9971904Z test_pre_backward_hook_registration_cuda_first_True (__main__.TestHooks) 2023-01-11T22:54:20.9972576Z Tests that FSDP pre-backward hooks are registered on forward pass ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82627 2023-01-11T22:54:20.9973639Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82628 2023-01-11T22:54:20.9974274Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:20.9974730Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:20.9975311Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:20.9975770Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:20.9976357Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:20.9976803Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:20.9977373Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:20.9977832Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:20.9978304Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:20.9978810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:20.9979446Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:20.9980145Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:20.9980674Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:20.9981156Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:20.9981501Z dist init r=1, world=2 2023-01-11T22:54:20.9981761Z dist init r=0, world=2 2023-01-11T22:54:20.9982009Z ok (4.012s) 2023-01-11T22:54:20.9982354Z test_register_functions_called_cuda_first_False_mixed_precision_False (__main__.TestHooks) 2023-01-11T22:54:20.9982909Z Tests that ``_register_{pre|post}_backward_hooks()`` are called ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82710 2023-01-11T22:54:20.9983449Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82711 2023-01-11T22:54:20.9984062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:20.9984619Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:20.9985214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:20.9985687Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:20.9986267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:20.9986768Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:20.9987373Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:20.9987842Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:20.9988404Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:20.9989269Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:20.9990658Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:20.9991579Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:20.9992081Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:20.9992561Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:20.9993826Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:20.9994616Z warnings.warn( 2023-01-11T22:54:20.9995781Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:20.9996525Z warnings.warn( 2023-01-11T22:54:20.9996779Z dist init r=1, world=2 2023-01-11T22:54:20.9997031Z dist init r=0, world=2 2023-01-11T22:54:20.9997257Z ok (3.912s) 2023-01-11T22:54:20.9997611Z test_register_functions_called_cuda_first_False_mixed_precision_True (__main__.TestHooks) 2023-01-11T22:54:20.9998158Z Tests that ``_register_{pre|post}_backward_hooks()`` are called ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82789 2023-01-11T22:54:20.9998693Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82790 2023-01-11T22:54:20.9999289Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:20.9999741Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0000324Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0000797Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0001359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0001808Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0002382Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0002944Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0003402Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.0003905Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.0004619Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0005308Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0005832Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.0006305Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.0007431Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:54:21.0008123Z warnings.warn( 2023-01-11T22:54:21.0009145Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:54:21.0009846Z warnings.warn( 2023-01-11T22:54:21.0011008Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0011786Z warnings.warn( 2023-01-11T22:54:21.0013500Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0014816Z warnings.warn( 2023-01-11T22:54:21.0015258Z dist init r=1, world=2 2023-01-11T22:54:21.0015713Z dist init r=0, world=2 2023-01-11T22:54:21.0016158Z ok (3.914s) 2023-01-11T22:54:21.0016866Z test_register_functions_called_cuda_first_True_mixed_precision_False (__main__.TestHooks) 2023-01-11T22:54:21.0017829Z Tests that ``_register_{pre|post}_backward_hooks()`` are called ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82868 2023-01-11T22:54:21.0018778Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82869 2023-01-11T22:54:21.0019892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0020728Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0021809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0022678Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0023777Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0024861Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0025929Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0026797Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0027672Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.0028752Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.0030067Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0031331Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0032310Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.0033209Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.0033874Z dist init r=0, world=2 2023-01-11T22:54:21.0034339Z dist init r=1, world=2 2023-01-11T22:54:21.0034789Z ok (4.013s) 2023-01-11T22:54:21.0035509Z test_register_functions_called_cuda_first_True_mixed_precision_True (__main__.TestHooks) 2023-01-11T22:54:21.0036535Z Tests that ``_register_{pre|post}_backward_hooks()`` are called ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82947 2023-01-11T22:54:21.0037521Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82948 2023-01-11T22:54:21.0038668Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0039504Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0040575Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0041460Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0042536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0043327Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0044385Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0045232Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0046080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.0046992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.0048200Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0049565Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0050524Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.0051400Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.0054079Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:54:21.0055004Z warnings.warn( 2023-01-11T22:54:21.0056064Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:54:21.0057086Z warnings.warn( 2023-01-11T22:54:21.0057440Z dist init r=1, world=2 2023-01-11T22:54:21.0057845Z dist init r=0, world=2 2023-01-11T22:54:21.0058091Z ok (4.013s) 2023-01-11T22:54:21.0058775Z test_transformer_no_grad_mixed_precision_False (__main__.TestNoGrad) 2023-01-11T22:54:21.0060083Z Tests that for an FSDP-wrapped transformer model with shared ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83026 2023-01-11T22:54:21.0061145Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83027 2023-01-11T22:54:21.0062308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0063218Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0064422Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0065357Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0066395Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0067226Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0068320Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0069243Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0070069Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.0070998Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.0072326Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0073460Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0074369Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.0075319Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.0076998Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0078391Z warnings.warn( 2023-01-11T22:54:21.0080628Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0082124Z warnings.warn( 2023-01-11T22:54:21.0082545Z dist init r=0, world=2 2023-01-11T22:54:21.0083014Z dist init r=1, world=2 2023-01-11T22:54:21.0083433Z ok (4.012s) 2023-01-11T22:54:21.0084019Z test_transformer_no_grad_mixed_precision_True (__main__.TestNoGrad) 2023-01-11T22:54:21.0085231Z Tests that for an FSDP-wrapped transformer model with shared ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83109 2023-01-11T22:54:21.0086420Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83110 2023-01-11T22:54:21.0087543Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0088426Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0089621Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0090619Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0091800Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0092701Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0094512Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0095488Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0096301Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.0097291Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.0098537Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0099944Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0100944Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.0101886Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.0104149Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:54:21.0105505Z warnings.warn( 2023-01-11T22:54:21.0107448Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:54:21.0108793Z warnings.warn( 2023-01-11T22:54:21.0111013Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0112481Z warnings.warn( 2023-01-11T22:54:21.0114717Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0116146Z warnings.warn( 2023-01-11T22:54:21.0116558Z dist init r=1, world=2 2023-01-11T22:54:21.0116978Z dist init r=0, world=2 2023-01-11T22:54:21.0117450Z ok (4.013s) 2023-01-11T22:54:21.0118012Z test_param_change_after_init_mixed_precision_False (__main__.TestParamInit) 2023-01-11T22:54:21.0119497Z Tests that changing FSDP model parameter values in-place after FSDP ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83192 2023-01-11T22:54:21.0120522Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83193 2023-01-11T22:54:21.0121656Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0122602Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0123742Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0124667Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0125756Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0126537Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0127569Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0128392Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0129179Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.0130106Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.0131352Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0132649Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0134234Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.0135130Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.0137493Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0138942Z warnings.warn( 2023-01-11T22:54:21.0141145Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0142637Z warnings.warn( 2023-01-11T22:54:21.0143042Z dist init r=1, world=2 2023-01-11T22:54:21.0143492Z dist init r=0, world=2 2023-01-11T22:54:21.0143897Z ok (3.912s) 2023-01-11T22:54:21.0144492Z test_param_change_after_init_mixed_precision_True (__main__.TestParamInit) 2023-01-11T22:54:21.0145743Z Tests that changing FSDP model parameter values in-place after FSDP ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83271 2023-01-11T22:54:21.0146782Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83272 2023-01-11T22:54:21.0148181Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0149066Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0150214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0151283Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0152352Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0153177Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0154244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0155227Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0156096Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.0157021Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.0158244Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0159582Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0160490Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.0161339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.0163491Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:54:21.0164914Z warnings.warn( 2023-01-11T22:54:21.0166308Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:69: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2023-01-11T22:54:21.0167696Z warnings.warn( 2023-01-11T22:54:21.0169870Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0171367Z warnings.warn( 2023-01-11T22:54:21.0174164Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0175664Z warnings.warn( 2023-01-11T22:54:21.0176130Z dist init r=0, world=2 2023-01-11T22:54:21.0176631Z dist init r=1, world=2 2023-01-11T22:54:21.0183536Z ok (4.012s) 2023-01-11T22:54:21.0184252Z test_delayed_optim_step_offload_false_no_shard (__main__.TestParityWithDDP) 2023-01-11T22:54:21.0185266Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83350 2023-01-11T22:54:21.0186274Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83351 2023-01-11T22:54:21.0187506Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0188506Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0189599Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0190432Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0191610Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0192516Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0193559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0194392Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0195252Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.0196175Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.0197437Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0198703Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0199732Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.0200636Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.0201562Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0202508Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0204956Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0206427Z warnings.warn( 2023-01-11T22:54:21.0208758Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0210339Z warnings.warn( 2023-01-11T22:54:21.0211062Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0211951Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0213421Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0214355Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0215261Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0216149Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0217057Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0217938Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0218816Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0219749Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0220823Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0221677Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0222580Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0223483Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0224478Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0225358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0227248Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0229627Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0232147Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0234648Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0237018Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0239289Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0241607Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0243947Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0246346Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0248654Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0251147Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0254324Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0256881Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0259414Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0261756Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0264075Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0266493Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0268883Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0271214Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0273501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0275832Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0278143Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0280773Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0283206Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0285623Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0287916Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0290199Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0292526Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0295249Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0297562Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0299894Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0302219Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0304504Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0306876Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0309227Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0311757Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0314070Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0316405Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0318857Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0377663Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0380201Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0382600Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0384908Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0387222Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0388583Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0389506Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0390413Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0391285Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0392202Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0393149Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0394313Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0395164Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0396046Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0396925Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0397945Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0398849Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0399841Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0400749Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0401604Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0402467Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0403327Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0404200Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0405082Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0405955Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0406880Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0407797Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0408609Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0409504Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0411434Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0414140Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0416539Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0418918Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0421261Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0423614Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0426117Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0428540Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0430894Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0433181Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0435506Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0437884Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0440338Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0442935Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0445352Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0447665Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0449917Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0452219Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0455116Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0457626Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0459952Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0462456Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0464964Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0467308Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0469663Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0471936Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0474273Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0476638Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0478972Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0481277Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0483821Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0486272Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0488623Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0490977Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0493595Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0495981Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0498379Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0500824Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0503395Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0505955Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0508383Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0510776Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0513264Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0515798Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0517257Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0518729Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0520042Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0521885Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0523519Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0524754Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0526011Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0527396Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0528728Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0530197Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0531643Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0533755Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0535006Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0536206Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0537427Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0538636Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0539849Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0541069Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0541781Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0542276Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0542763Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0543244Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0543705Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0544190Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0544561Z dist init r=1, world=2 2023-01-11T22:54:21.0544805Z dist init r=0, world=2 2023-01-11T22:54:21.0545048Z ok (16.436s) 2023-01-11T22:54:21.0545388Z test_delayed_optim_step_offload_false_none (__main__.TestParityWithDDP) 2023-01-11T22:54:21.0545976Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83433 2023-01-11T22:54:21.0546523Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83434 2023-01-11T22:54:21.0547162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0547624Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0548291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0548774Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0549361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0549864Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0550439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0550913Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0551370Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.0551876Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.0552518Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0553221Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0553748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.0554209Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.0554688Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0555182Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0556470Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0557249Z warnings.warn( 2023-01-11T22:54:21.0558410Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0559197Z warnings.warn( 2023-01-11T22:54:21.0559585Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0560073Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0560563Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0561021Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0561510Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0561991Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0562443Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0562931Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0563409Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0563880Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0564406Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0564882Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0565356Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0565809Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0566326Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0566812Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0567824Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0569071Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0570276Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0571518Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0572740Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0574718Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0577031Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0579563Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0580444Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0580937Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0581402Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0581888Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0582360Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0582810Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0583390Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0583863Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0584341Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0584799Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0585325Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0585805Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0586261Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0586729Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0587194Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0587663Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0588120Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0588588Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0589056Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0589509Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0589974Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0590440Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0590909Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0591355Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0592368Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0593603Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0594830Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0596047Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0597277Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0598496Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0599779Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0601047Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0602281Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0603803Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:237: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.0604621Z (rank, world_num_valid_indices[rank]) 2023-01-11T22:54:21.0605026Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0605490Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0605968Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0606447Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0606929Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0607384Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0607740Z dist init r=1, world=2 2023-01-11T22:54:21.0607994Z dist init r=0, world=2 2023-01-11T22:54:21.0608215Z ok (28.860s) 2023-01-11T22:54:21.0608567Z test_delayed_optim_step_offload_false_shard_grad_op (__main__.TestParityWithDDP) 2023-01-11T22:54:21.0609118Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83516 2023-01-11T22:54:21.0609631Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83517 2023-01-11T22:54:21.0610246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0610701Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0611283Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0611735Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0612312Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0612763Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0613592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0614045Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0614504Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.0615007Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.0615750Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0616443Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0616967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.0617532Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.0618004Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0618485Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0619770Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0620556Z warnings.warn( 2023-01-11T22:54:21.0621716Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0622455Z warnings.warn( 2023-01-11T22:54:21.0622829Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0623310Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0623795Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0624254Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0624732Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0625205Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0625661Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0626131Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0626601Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0627069Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0627528Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0627997Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0628465Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0628919Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0629393Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0629859Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0630859Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0632196Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0633472Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0634725Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0635951Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0637180Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0638391Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0639618Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0640347Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0640818Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0641296Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0641772Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0642249Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0642706Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0643182Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0643654Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0644109Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0644581Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0645054Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0645526Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0645981Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0646449Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0646917Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0647436Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0647914Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0648385Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0648854Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0649355Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0649837Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0650309Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0650762Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0651238Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0652238Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0653672Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0654913Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0656143Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0657368Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0658590Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0659819Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0661039Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0662257Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0663943Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:237: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.0664778Z (rank, world_num_valid_indices[rank]) 2023-01-11T22:54:21.0665183Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0665652Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0666130Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0666610Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0667074Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0667552Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0667907Z dist init r=0, world=2 2023-01-11T22:54:21.0668159Z dist init r=1, world=2 2023-01-11T22:54:21.0668381Z ok (28.862s) 2023-01-11T22:54:21.0668724Z test_delayed_optim_step_offload_true_no_shard (__main__.TestParityWithDDP) 2023-01-11T22:54:21.0669872Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82490 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T22:54:21.0670652Z test_delayed_optim_step_offload_true_none (__main__.TestParityWithDDP) 2023-01-11T22:54:21.0671167Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83599 2023-01-11T22:54:21.0671695Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83600 2023-01-11T22:54:21.0672304Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0672737Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0673317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0673783Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0674361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.0674785Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.0675364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.0675824Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.0676258Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.0676759Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.0677414Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0678099Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.0678606Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.0679075Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.0679620Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0680105Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0681414Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0682211Z warnings.warn( 2023-01-11T22:54:21.0683377Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.0684150Z warnings.warn( 2023-01-11T22:54:21.0684417Z File "", line 1, in 2023-01-11T22:54:21.0684770Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0685151Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0685524Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0685879Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0686267Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0686603Z self.run() 2023-01-11T22:54:21.0686916Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0687289Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0687807Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0688198Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0688709Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0689103Z getattr(self, test_name)() 2023-01-11T22:54:21.0689618Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0689968Z fn() 2023-01-11T22:54:21.0690462Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0690855Z test(self, **param_kwargs) 2023-01-11T22:54:21.0691368Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0691744Z return func(*args, **kwargs) 2023-01-11T22:54:21.0692147Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0692520Z self.run_subtests( 2023-01-11T22:54:21.0693193Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0693629Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0694188Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0694610Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0695145Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0695541Z output = model(*input) 2023-01-11T22:54:21.0696023Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0696488Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0697041Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0697495Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0698063Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0698505Z _lazy_init(state, module) 2023-01-11T22:54:21.0699034Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0699470Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0700040Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0700475Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0700991Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0701369Z return func(*args, **kwargs) 2023-01-11T22:54:21.0701884Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0702268Z p_assert( 2023-01-11T22:54:21.0702739Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0703106Z traceback.print_stack() 2023-01-11T22:54:21.0703390Z File "", line 1, in 2023-01-11T22:54:21.0703760Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0704114Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0704484Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0704855Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0705246Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0705564Z self.run() 2023-01-11T22:54:21.0705897Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0706259Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0706758Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0707147Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0707676Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0708051Z getattr(self, test_name)() 2023-01-11T22:54:21.0708563Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0708930Z fn() 2023-01-11T22:54:21.0709421Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0709805Z test(self, **param_kwargs) 2023-01-11T22:54:21.0710310Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0710699Z return func(*args, **kwargs) 2023-01-11T22:54:21.0711088Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0711462Z self.run_subtests( 2023-01-11T22:54:21.0711971Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0712394Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0712926Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0713344Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0713900Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0714351Z output = model(*input) 2023-01-11T22:54:21.0714834Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0715221Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0715821Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0716270Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0716838Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0717233Z _lazy_init(state, module) 2023-01-11T22:54:21.0717720Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0718152Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0718748Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0719183Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0719673Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0720055Z return func(*args, **kwargs) 2023-01-11T22:54:21.0720592Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0720962Z p_assert( 2023-01-11T22:54:21.0721431Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0721812Z traceback.print_stack() 2023-01-11T22:54:21.0722190Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0722681Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0723065Z File "", line 1, in 2023-01-11T22:54:21.0723441Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0723795Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0724165Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0724536Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0724909Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0725244Z self.run() 2023-01-11T22:54:21.0725575Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0725940Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0726438Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0726822Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0727356Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0727727Z getattr(self, test_name)() 2023-01-11T22:54:21.0728238Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0728606Z fn() 2023-01-11T22:54:21.0729081Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0729479Z test(self, **param_kwargs) 2023-01-11T22:54:21.0729987Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0730375Z return func(*args, **kwargs) 2023-01-11T22:54:21.0730757Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0731130Z self.run_subtests( 2023-01-11T22:54:21.0731719Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0732122Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0732670Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0733276Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0733916Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0734311Z output = model(*input) 2023-01-11T22:54:21.0734793Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0735182Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0735708Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0736167Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0736730Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0737128Z _lazy_init(state, module) 2023-01-11T22:54:21.0737616Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0738052Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0738644Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0739057Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0739569Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0739949Z return func(*args, **kwargs) 2023-01-11T22:54:21.0740482Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0740851Z p_assert( 2023-01-11T22:54:21.0741325Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0741707Z traceback.print_stack() 2023-01-11T22:54:21.0741977Z File "", line 1, in 2023-01-11T22:54:21.0742353Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0742723Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0743076Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0743446Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0743833Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0744169Z self.run() 2023-01-11T22:54:21.0744481Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0744854Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0745372Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0745740Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0746264Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0746658Z getattr(self, test_name)() 2023-01-11T22:54:21.0747154Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0747520Z fn() 2023-01-11T22:54:21.0748011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0748402Z test(self, **param_kwargs) 2023-01-11T22:54:21.0748895Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0749397Z return func(*args, **kwargs) 2023-01-11T22:54:21.0749802Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0750159Z self.run_subtests( 2023-01-11T22:54:21.0750664Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0751087Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0751693Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0752101Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0752659Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0753057Z output = model(*input) 2023-01-11T22:54:21.0753517Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0753905Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0754448Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0754896Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0755441Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0755836Z _lazy_init(state, module) 2023-01-11T22:54:21.0756343Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0756756Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0757344Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0757776Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0758292Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0758655Z return func(*args, **kwargs) 2023-01-11T22:54:21.0759228Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0759612Z p_assert( 2023-01-11T22:54:21.0760070Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0760448Z traceback.print_stack() 2023-01-11T22:54:21.0760840Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0761327Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0761691Z File "", line 1, in 2023-01-11T22:54:21.0762062Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0762439Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0762793Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0763161Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0763547Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0763862Z self.run() 2023-01-11T22:54:21.0764196Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0764560Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0765073Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0765447Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0765975Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0766366Z getattr(self, test_name)() 2023-01-11T22:54:21.0766943Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0767310Z fn() 2023-01-11T22:54:21.0767802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0768178Z test(self, **param_kwargs) 2023-01-11T22:54:21.0768746Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0769149Z return func(*args, **kwargs) 2023-01-11T22:54:21.0769555Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0769915Z self.run_subtests( 2023-01-11T22:54:21.0770421Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0770843Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0771378Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0771802Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0772352Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0772748Z output = model(*input) 2023-01-11T22:54:21.0773414Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0773805Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0774350Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0774782Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0775342Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0775741Z _lazy_init(state, module) 2023-01-11T22:54:21.0776025Z File "", line 1, in 2023-01-11T22:54:21.0776524Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0776955Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0777545Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0777958Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0778346Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0778721Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0779234Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0779596Z return func(*args, **kwargs) 2023-01-11T22:54:21.0779950Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0780330Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0780874Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0781259Z p_assert( 2023-01-11T22:54:21.0781604Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0781923Z self.run() 2023-01-11T22:54:21.0782397Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0782777Z traceback.print_stack() 2023-01-11T22:54:21.0783137Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0783490Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0784003Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0784488Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0785004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0785398Z getattr(self, test_name)() 2023-01-11T22:54:21.0785916Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0786267Z fn() 2023-01-11T22:54:21.0786835Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0787250Z test(self, **param_kwargs) 2023-01-11T22:54:21.0787767Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0788139Z return func(*args, **kwargs) 2023-01-11T22:54:21.0788545Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0788931Z self.run_subtests( 2023-01-11T22:54:21.0789415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0789835Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0790391Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0790812Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0791346Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0791742Z output = model(*input) 2023-01-11T22:54:21.0792222Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0792588Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0793133Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0793585Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0794145Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0794522Z _lazy_init(state, module) 2023-01-11T22:54:21.0795031Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0795468Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0796037Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0796470Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0796978Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0797356Z return func(*args, **kwargs) 2023-01-11T22:54:21.0797869Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0798256Z p_assert( 2023-01-11T22:54:21.0798727Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0799092Z traceback.print_stack() 2023-01-11T22:54:21.0799483Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0799969Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0800350Z File "", line 1, in 2023-01-11T22:54:21.0800700Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0801069Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0801439Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0801790Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0802250Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0802586Z self.run() 2023-01-11T22:54:21.0802903Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0803270Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0803793Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0804231Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0804754Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0805150Z getattr(self, test_name)() 2023-01-11T22:54:21.0805665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0806013Z fn() 2023-01-11T22:54:21.0806500Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0806899Z test(self, **param_kwargs) 2023-01-11T22:54:21.0807391Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0807781Z return func(*args, **kwargs) 2023-01-11T22:54:21.0808183Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0808559Z self.run_subtests( 2023-01-11T22:54:21.0809042Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0809467Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0810015Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0810418Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0810972Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0811370Z output = model(*input) 2023-01-11T22:54:21.0811846Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0812212Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0812758Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0813399Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0813952Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0814348Z _lazy_init(state, module) 2023-01-11T22:54:21.0814857Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0815287Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0815863Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0816295Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0816808Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0817173Z return func(*args, **kwargs) 2023-01-11T22:54:21.0817712Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0818092Z p_assert( 2023-01-11T22:54:21.0818355Z File "", line 1, in 2023-01-11T22:54:21.0818839Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0819225Z traceback.print_stack() 2023-01-11T22:54:21.0819594Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0820044Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0820414Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0820783Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0821153Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0821493Z self.run() 2023-01-11T22:54:21.0821886Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0822261Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0822764Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0823156Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0823683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0824059Z getattr(self, test_name)() 2023-01-11T22:54:21.0824577Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0824941Z fn() 2023-01-11T22:54:21.0825414Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0825807Z test(self, **param_kwargs) 2023-01-11T22:54:21.0826323Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0826724Z return func(*args, **kwargs) 2023-01-11T22:54:21.0827119Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0827494Z self.run_subtests( 2023-01-11T22:54:21.0827999Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0828399Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0828963Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0829385Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0829931Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0830307Z output = model(*input) 2023-01-11T22:54:21.0830785Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0831208Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0831737Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0832209Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0832775Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0833169Z _lazy_init(state, module) 2023-01-11T22:54:21.0833661Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0834100Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0834692Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0835130Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0835626Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0836004Z return func(*args, **kwargs) 2023-01-11T22:54:21.0836538Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0836902Z p_assert( 2023-01-11T22:54:21.0837364Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0837816Z traceback.print_stack() 2023-01-11T22:54:21.0838195Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0838717Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0839097Z File "", line 1, in 2023-01-11T22:54:21.0839526Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0839887Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0840256Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0840630Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0841019Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0841336Z self.run() 2023-01-11T22:54:21.0841674Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0842051Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0842586Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0842965Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0843256Z File "", line 1, in 2023-01-11T22:54:21.0843778Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0844159Z getattr(self, test_name)() 2023-01-11T22:54:21.0844678Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0845042Z fn() 2023-01-11T22:54:21.0845376Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0845725Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0846260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0846701Z test(self, **param_kwargs) 2023-01-11T22:54:21.0847040Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0847409Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0847947Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0848344Z return func(*args, **kwargs) 2023-01-11T22:54:21.0848692Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0849023Z self.run() 2023-01-11T22:54:21.0849404Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0849762Z self.run_subtests( 2023-01-11T22:54:21.0850105Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0850472Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0850990Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0851414Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0851953Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0852343Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0853037Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0853480Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0854028Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0854412Z getattr(self, test_name)() 2023-01-11T22:54:21.0854953Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0855444Z output = model(*input) 2023-01-11T22:54:21.0855957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0856303Z fn() 2023-01-11T22:54:21.0856751Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0857132Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0857729Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0858115Z test(self, **param_kwargs) 2023-01-11T22:54:21.0858654Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0859102Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0859662Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0860060Z return func(*args, **kwargs) 2023-01-11T22:54:21.0860575Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0860962Z _lazy_init(state, module) 2023-01-11T22:54:21.0861344Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0861721Z self.run_subtests( 2023-01-11T22:54:21.0862222Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0862634Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0863195Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0863630Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0864217Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0864633Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0865165Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0865581Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0866099Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0866462Z return func(*args, **kwargs) 2023-01-11T22:54:21.0866987Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0867378Z output = model(*input) 2023-01-11T22:54:21.0867888Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0868268Z p_assert( 2023-01-11T22:54:21.0868731Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0869113Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0869599Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0869982Z traceback.print_stack() 2023-01-11T22:54:21.0870520Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0871005Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0871599Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0872004Z _lazy_init(state, module) 2023-01-11T22:54:21.0872508Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0872918Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0873604Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0874026Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0874558Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0874948Z return func(*args, **kwargs) 2023-01-11T22:54:21.0875541Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0875918Z p_assert( 2023-01-11T22:54:21.0876401Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0876811Z traceback.print_stack() 2023-01-11T22:54:21.0877217Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0877698Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0878085Z File "", line 1, in 2023-01-11T22:54:21.0878466Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0878824Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0879196Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0879573Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0879877Z File "", line 1, in 2023-01-11T22:54:21.0880231Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0880569Z self.run() 2023-01-11T22:54:21.0880902Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0881252Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0881630Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0882004Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0882501Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0882893Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0883258Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0883632Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0884157Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0884554Z getattr(self, test_name)() 2023-01-11T22:54:21.0884920Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0885241Z self.run() 2023-01-11T22:54:21.0885733Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0886101Z fn() 2023-01-11T22:54:21.0886412Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0886782Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0887321Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0887720Z test(self, **param_kwargs) 2023-01-11T22:54:21.0888197Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0888589Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0889107Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0889490Z return func(*args, **kwargs) 2023-01-11T22:54:21.0890006Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0890396Z getattr(self, test_name)() 2023-01-11T22:54:21.0890913Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0891274Z self.run_subtests( 2023-01-11T22:54:21.0891788Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0892157Z fn() 2023-01-11T22:54:21.0892666Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0893309Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0893875Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0894260Z test(self, **param_kwargs) 2023-01-11T22:54:21.0894776Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0895200Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0895739Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0896115Z return func(*args, **kwargs) 2023-01-11T22:54:21.0896644Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0897040Z output = model(*input) 2023-01-11T22:54:21.0897424Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0897804Z self.run_subtests( 2023-01-11T22:54:21.0898278Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0898664Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0899167Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0899592Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0900155Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0900592Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0901154Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0901581Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0902130Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0902513Z _lazy_init(state, module) 2023-01-11T22:54:21.0903049Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0903448Z output = model(*input) 2023-01-11T22:54:21.0903934Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0904374Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0904891Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0905278Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0905828Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0906268Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0906818Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0907259Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0907792Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0908176Z return func(*args, **kwargs) 2023-01-11T22:54:21.0908805Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0909179Z _lazy_init(state, module) 2023-01-11T22:54:21.0909712Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0910098Z p_assert( 2023-01-11T22:54:21.0910628Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0911085Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0911618Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0912005Z traceback.print_stack() 2023-01-11T22:54:21.0912541Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0912974Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0913491Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0913856Z return func(*args, **kwargs) 2023-01-11T22:54:21.0914391Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0914777Z p_assert( 2023-01-11T22:54:21.0915251Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0915619Z traceback.print_stack() 2023-01-11T22:54:21.0916015Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0916506Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0916880Z File "", line 1, in 2023-01-11T22:54:21.0917169Z File "", line 1, in 2023-01-11T22:54:21.0917545Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0917901Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0918273Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0918641Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0919022Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0919379Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0919756Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0920093Z self.run() 2023-01-11T22:54:21.0920406Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0920779Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0921152Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0921502Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0921885Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0922222Z self.run() 2023-01-11T22:54:21.0922699Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0923076Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0923443Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0923815Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0924335Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0924731Z getattr(self, test_name)() 2023-01-11T22:54:21.0925225Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0925594Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0926113Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0926557Z fn() 2023-01-11T22:54:21.0927046Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0927422Z getattr(self, test_name)() 2023-01-11T22:54:21.0927942Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0928385Z test(self, **param_kwargs) 2023-01-11T22:54:21.0928891Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0929263Z fn() 2023-01-11T22:54:21.0929744Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0930134Z return func(*args, **kwargs) 2023-01-11T22:54:21.0930636Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0931037Z test(self, **param_kwargs) 2023-01-11T22:54:21.0931439Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0931799Z self.run_subtests( 2023-01-11T22:54:21.0932301Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0932696Z return func(*args, **kwargs) 2023-01-11T22:54:21.0933395Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0933824Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0934257Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0934635Z self.run_subtests( 2023-01-11T22:54:21.0935130Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0935557Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0936087Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0936498Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0937057Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0937455Z output = model(*input) 2023-01-11T22:54:21.0937968Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0938372Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0938872Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0939261Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0939790Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0940188Z output = model(*input) 2023-01-11T22:54:21.0940713Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0941168Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0941675Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0942064Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0942598Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0942975Z _lazy_init(state, module) 2023-01-11T22:54:21.0943501Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0944052Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0944610Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0945029Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0945589Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0946051Z _lazy_init(state, module) 2023-01-11T22:54:21.0946602Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0947040Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0947567Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0948004Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0948519Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0948898Z return func(*args, **kwargs) 2023-01-11T22:54:21.0949452Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0949884Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0950423Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0950808Z p_assert( 2023-01-11T22:54:21.0951279Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0951646Z return func(*args, **kwargs) 2023-01-11T22:54:21.0952138Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0952521Z traceback.print_stack() 2023-01-11T22:54:21.0953040Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0953426Z p_assert( 2023-01-11T22:54:21.0953889Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0954268Z traceback.print_stack() 2023-01-11T22:54:21.0954643Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0955136Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0955522Z File "", line 1, in 2023-01-11T22:54:21.0955877Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0956249Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0956620Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0956974Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0957365Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0957702Z self.run() 2023-01-11T22:54:21.0958034Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0958381Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0958895Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0959295Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0959832Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0960234Z getattr(self, test_name)() 2023-01-11T22:54:21.0960753Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0961126Z fn() 2023-01-11T22:54:21.0961597Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0962064Z test(self, **param_kwargs) 2023-01-11T22:54:21.0962580Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0962954Z return func(*args, **kwargs) 2023-01-11T22:54:21.0963356Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0963787Z self.run_subtests( 2023-01-11T22:54:21.0964289Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0964721Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0965270Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0965693Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0966228Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0966628Z output = model(*input) 2023-01-11T22:54:21.0967101Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0967468Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0968010Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0968466Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0969030Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0969409Z _lazy_init(state, module) 2023-01-11T22:54:21.0969911Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0970348Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0970921Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0971358Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0971867Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0972251Z return func(*args, **kwargs) 2023-01-11T22:54:21.0972767Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0973358Z p_assert( 2023-01-11T22:54:21.0973833Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0974202Z traceback.print_stack() 2023-01-11T22:54:21.0974487Z File "", line 1, in 2023-01-11T22:54:21.0974854Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.0975231Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.0975584Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.0975954Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.0976343Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.0976664Z self.run() 2023-01-11T22:54:21.0976998Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.0977364Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.0977863Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.0978254Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.0978780Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.0979174Z getattr(self, test_name)() 2023-01-11T22:54:21.0979770Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.0980143Z fn() 2023-01-11T22:54:21.0980633Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.0981012Z test(self, **param_kwargs) 2023-01-11T22:54:21.0981587Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.0981997Z return func(*args, **kwargs) 2023-01-11T22:54:21.0982382Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.0982756Z self.run_subtests( 2023-01-11T22:54:21.0983262Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.0983688Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.0984223Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.0984642Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.0985195Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.0985574Z output = model(*input) 2023-01-11T22:54:21.0986054Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.0986444Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.0986985Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.0987421Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.0987981Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.0988376Z _lazy_init(state, module) 2023-01-11T22:54:21.0988865Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.0989299Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.0989887Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.0990324Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.0990822Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.0991204Z return func(*args, **kwargs) 2023-01-11T22:54:21.0991741Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.0992111Z p_assert( 2023-01-11T22:54:21.0992579Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.0992967Z traceback.print_stack() 2023-01-11T22:54:21.0993361Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0993831Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.0994837Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0996084Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0997430Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0998727Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.0999986Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1001203Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1002441Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1003665Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1004299Z File "", line 1, in 2023-01-11T22:54:21.1004672Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1005028Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1005402Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1005773Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1006163Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1006485Z self.run() 2023-01-11T22:54:21.1006818Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1007185Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1007687Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1008080Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1008606Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1008986Z getattr(self, test_name)() 2023-01-11T22:54:21.1009501Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1009874Z fn() 2023-01-11T22:54:21.1010363Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1010746Z test(self, **param_kwargs) 2023-01-11T22:54:21.1011254Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1011648Z return func(*args, **kwargs) 2023-01-11T22:54:21.1012036Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1012412Z self.run_subtests( 2023-01-11T22:54:21.1013118Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1013666Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1014210Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1014635Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1015255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1015648Z output = model(*input) 2023-01-11T22:54:21.1016134Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1016523Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1017070Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1017509Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1018078Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1018471Z _lazy_init(state, module) 2023-01-11T22:54:21.1018959Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1019396Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1019987Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1020426Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1020920Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1021304Z return func(*args, **kwargs) 2023-01-11T22:54:21.1021838Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1022213Z p_assert( 2023-01-11T22:54:21.1022681Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1023067Z traceback.print_stack() 2023-01-11T22:54:21.1023339Z File "", line 1, in 2023-01-11T22:54:21.1023709Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1024085Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1024454Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1024806Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1025193Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1025637Z self.run() 2023-01-11T22:54:21.1025997Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1026416Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1027152Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1027567Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1028208Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1028678Z getattr(self, test_name)() 2023-01-11T22:54:21.1029325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1029727Z fn() 2023-01-11T22:54:21.1030298Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1030773Z test(self, **param_kwargs) 2023-01-11T22:54:21.1031306Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1031824Z return func(*args, **kwargs) 2023-01-11T22:54:21.1032422Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1032873Z self.run_subtests( 2023-01-11T22:54:21.1033417Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1033913Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1034629Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1035086Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1035733Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1036207Z output = model(*input) 2023-01-11T22:54:21.1036798Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1037249Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1037879Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1038405Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1038991Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1039499Z _lazy_init(state, module) 2023-01-11T22:54:21.1040103Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1040613Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1041226Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1041739Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1042406Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1042821Z return func(*args, **kwargs) 2023-01-11T22:54:21.1043426Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1043984Z p_assert( 2023-01-11T22:54:21.1044572Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1044984Z traceback.print_stack() 2023-01-11T22:54:21.1045450Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1046014Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1046424Z File "", line 1, in 2023-01-11T22:54:21.1046942Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1047388Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1047790Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1048244Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1048728Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1049170Z self.run() 2023-01-11T22:54:21.1049524Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1049986Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1050580Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1051023Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1051657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1052131Z getattr(self, test_name)() 2023-01-11T22:54:21.1052697Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1053421Z fn() 2023-01-11T22:54:21.1054055Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1054542Z test(self, **param_kwargs) 2023-01-11T22:54:21.1055074Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1055698Z return func(*args, **kwargs) 2023-01-11T22:54:21.1056240Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1056639Z self.run_subtests( 2023-01-11T22:54:21.1057239Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1057748Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1058387Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1058844Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1059526Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1060000Z output = model(*input) 2023-01-11T22:54:21.1060573Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1061090Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1061720Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1062250Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1063091Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1063579Z _lazy_init(state, module) 2023-01-11T22:54:21.1064180Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1064639Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1065355Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1065917Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1067042Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1067490Z return func(*args, **kwargs) 2023-01-11T22:54:21.1068188Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1068721Z p_assert( 2023-01-11T22:54:21.1069219Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1069734Z traceback.print_stack() 2023-01-11T22:54:21.1070258Z File "", line 1, in 2023-01-11T22:54:21.1070753Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1071219Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1071678Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1072135Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1072540Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1072991Z self.run() 2023-01-11T22:54:21.1073481Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1073879Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1074521Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1074992Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1075749Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1076167Z getattr(self, test_name)() 2023-01-11T22:54:21.1076775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1077217Z fn() 2023-01-11T22:54:21.1077787Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1078432Z test(self, **param_kwargs) 2023-01-11T22:54:21.1079041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1079518Z return func(*args, **kwargs) 2023-01-11T22:54:21.1080024Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1080517Z self.run_subtests( 2023-01-11T22:54:21.1081277Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1081742Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1082356Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1082932Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1083388Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1083547Z output = model(*input) 2023-01-11T22:54:21.1083920Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1084071Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1084493Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1084652Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1085064Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1085230Z _lazy_init(state, module) 2023-01-11T22:54:21.1085623Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1085834Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1086444Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1086649Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1087394Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1087522Z return func(*args, **kwargs) 2023-01-11T22:54:21.1087954Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1088106Z p_assert( 2023-01-11T22:54:21.1088459Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1088672Z traceback.print_stack() 2023-01-11T22:54:21.1088948Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1089238Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1089407Z File "", line 1, in 2023-01-11T22:54:21.1089602Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1089814Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1090051Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1090272Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1090531Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1090764Z self.run() 2023-01-11T22:54:21.1091415Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1091622Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1091961Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1092770Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1093272Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1093446Z getattr(self, test_name)() 2023-01-11T22:54:21.1093859Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1094006Z fn() 2023-01-11T22:54:21.1094412Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1094575Z test(self, **param_kwargs) 2023-01-11T22:54:21.1094915Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1095278Z return func(*args, **kwargs) 2023-01-11T22:54:21.1095574Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1095724Z self.run_subtests( 2023-01-11T22:54:21.1096126Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1096338Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1097129Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1097342Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1097711Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1097912Z output = model(*input) 2023-01-11T22:54:21.1098308Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1098484Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1098906Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1099180Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1099619Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1099778Z _lazy_init(state, module) 2023-01-11T22:54:21.1100114Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1100322Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1100845Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1101039Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1101241Z File "", line 1, in 2023-01-11T22:54:21.1101632Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1101825Z return func(*args, **kwargs) 2023-01-11T22:54:21.1102245Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1102334Z p_assert( 2023-01-11T22:54:21.1102633Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1102805Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1103237Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1103410Z traceback.print_stack() 2023-01-11T22:54:21.1103747Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1103934Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1104184Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1104274Z self.run() 2023-01-11T22:54:21.1104532Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1104812Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1105251Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1105420Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1105856Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1106018Z getattr(self, test_name)() 2023-01-11T22:54:21.1106413Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1106500Z fn() 2023-01-11T22:54:21.1106945Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1107116Z test(self, **param_kwargs) 2023-01-11T22:54:21.1107509Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1107673Z return func(*args, **kwargs) 2023-01-11T22:54:21.1107966Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1108115Z self.run_subtests( 2023-01-11T22:54:21.1108512Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1108661Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1109062Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1109300Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1109714Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1109901Z output = model(*input) 2023-01-11T22:54:21.1110309Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1110587Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1111016Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1111176Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1111693Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1111852Z _lazy_init(state, module) 2023-01-11T22:54:21.1112303Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1112544Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1112982Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1113326Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1113719Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1113830Z return func(*args, **kwargs) 2023-01-11T22:54:21.1114262Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1114408Z p_assert( 2023-01-11T22:54:21.1114816Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1115053Z traceback.print_stack() 2023-01-11T22:54:21.1115332Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1115604Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1116463Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1117378Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1117560Z File "", line 1, in 2023-01-11T22:54:21.1117764Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1117944Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1118187Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1118521Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1119317Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1119789Z self.run() 2023-01-11T22:54:21.1120049Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1120233Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1120571Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1120741Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1121149Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1121356Z getattr(self, test_name)() 2023-01-11T22:54:21.1121763Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1121911Z fn() 2023-01-11T22:54:21.1122321Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1122486Z test(self, **param_kwargs) 2023-01-11T22:54:21.1122830Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1123033Z return func(*args, **kwargs) 2023-01-11T22:54:21.1123327Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1123478Z self.run_subtests( 2023-01-11T22:54:21.1123915Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1124119Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1124549Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1124776Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1125141Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1125300Z output = model(*input) 2023-01-11T22:54:21.1125667Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1125849Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1126271Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1126488Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1126926Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1127170Z _lazy_init(state, module) 2023-01-11T22:54:21.1127519Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1127768Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1128255Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1128456Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1128880Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1129047Z return func(*args, **kwargs) 2023-01-11T22:54:21.1129465Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1129604Z p_assert( 2023-01-11T22:54:21.1129932Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1130101Z traceback.print_stack() 2023-01-11T22:54:21.1130264Z File "", line 1, in 2023-01-11T22:54:21.1130975Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1131172Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1131418Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1131608Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1131936Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1132029Z self.run() 2023-01-11T22:54:21.1132273Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1132489Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1133149Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1133340Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1133888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1134057Z getattr(self, test_name)() 2023-01-11T22:54:21.1134467Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1134555Z fn() 2023-01-11T22:54:21.1134938Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1135131Z test(self, **param_kwargs) 2023-01-11T22:54:21.1135540Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1135702Z return func(*args, **kwargs) 2023-01-11T22:54:21.1136385Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1136627Z self.run_subtests( 2023-01-11T22:54:21.1137092Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1137243Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1137652Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1137917Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1138387Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1138546Z output = model(*input) 2023-01-11T22:54:21.1138912Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1139126Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1139663Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1139824Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1140229Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1140401Z _lazy_init(state, module) 2023-01-11T22:54:21.1140856Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1141071Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1141929Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1142431Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1142834Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1142952Z return func(*args, **kwargs) 2023-01-11T22:54:21.1143507Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1143662Z p_assert( 2023-01-11T22:54:21.1144094Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1144295Z traceback.print_stack() 2023-01-11T22:54:21.1144583Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1144856Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1145130Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1145347Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1145610Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1145879Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1146151Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1146415Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1146677Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1146944Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1147188Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1147491Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1147704Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1147971Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1148243Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1148516Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1148775Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1149035Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1149346Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1149615Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1150412Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1151295Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1152154Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1152956Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1153736Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1154524Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1155668Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:224: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.1156087Z local_num_valid_indices = torch.tensor([num_valid_indices], **tensor_kwargs) # type: ignore[arg-type, call-overload] 2023-01-11T22:54:21.1156306Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1156590Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1156859Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1157125Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1157424Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1157687Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1157983Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1158262Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1158532Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1158744Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1159008Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1159306Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1159570Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1213864Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1214140Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1214530Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1214752Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1214981Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1215263Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1215499Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1215708Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1215932Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1216160Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1216385Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1217182Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1217932Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1218669Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1219389Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1220123Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1220841Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1221589Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1222319Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1223046Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1223849Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1224617Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1225351Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1226079Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1226807Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1227529Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1228246Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1228971Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1229685Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1230405Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1231130Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1231849Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1232630Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1232903Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1233138Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1233356Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1233572Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1233684Z dist init r=1, world=2 2023-01-11T22:54:21.1234005Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1234316Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1234635Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1234948Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1235245Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1235562Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1235869Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1236165Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1236485Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1236787Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1237077Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1237395Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1237530Z dist init r=0, world=2 2023-01-11T22:54:21.1237829Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1238136Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1238440Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1238738Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1239041Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1239418Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1239775Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1240089Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1240383Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1240688Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1240999Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1241306Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1241398Z ok (35.272s) 2023-01-11T22:54:21.1241621Z test_delayed_optim_step_offload_true_shard_grad_op (__main__.TestParityWithDDP) 2023-01-11T22:54:21.1241930Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83682 2023-01-11T22:54:21.1242151Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83683 2023-01-11T22:54:21.1242546Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.1242716Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.1243083Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.1243282Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.1243659Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.1243824Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.1244201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.1244378Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.1244626Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.1244859Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.1245258Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.1245637Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.1245864Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.1246096Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.1246321Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1246544Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1247558Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.1247726Z warnings.warn( 2023-01-11T22:54:21.1248767Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.1248885Z warnings.warn( 2023-01-11T22:54:21.1249007Z File "", line 1, in 2023-01-11T22:54:21.1249122Z File "", line 1, in 2023-01-11T22:54:21.1249341Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1249473Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1249670Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1249823Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1250039Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1250175Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1250377Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1250464Z self.run() 2023-01-11T22:54:21.1250667Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1250806Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1251014Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1251156Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1251360Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1251469Z self.run() 2023-01-11T22:54:21.1251815Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1251936Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1252131Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1252280Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1252637Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1252764Z getattr(self, test_name)() 2023-01-11T22:54:21.1253260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1253387Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1253759Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1253843Z fn() 2023-01-11T22:54:21.1254196Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1254323Z getattr(self, test_name)() 2023-01-11T22:54:21.1254683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1254796Z test(self, **param_kwargs) 2023-01-11T22:54:21.1255176Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1255264Z fn() 2023-01-11T22:54:21.1255599Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1255720Z return func(*args, **kwargs) 2023-01-11T22:54:21.1256192Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1256304Z test(self, **param_kwargs) 2023-01-11T22:54:21.1256556Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1256660Z self.run_subtests( 2023-01-11T22:54:21.1257077Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1257199Z return func(*args, **kwargs) 2023-01-11T22:54:21.1257541Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1257708Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1257947Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1258068Z self.run_subtests( 2023-01-11T22:54:21.1258429Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1258587Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1258926Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1259092Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1259460Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1259565Z output = model(*input) 2023-01-11T22:54:21.1259929Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1260076Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1260405Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1260548Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1260928Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1261050Z output = model(*input) 2023-01-11T22:54:21.1261431Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1261593Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1261919Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1262063Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1262431Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1262556Z _lazy_init(state, module) 2023-01-11T22:54:21.1262935Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1263102Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1263456Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1263606Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1263976Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1264102Z _lazy_init(state, module) 2023-01-11T22:54:21.1264505Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1264651Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1264993Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1265168Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1265571Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1265681Z return func(*args, **kwargs) 2023-01-11T22:54:21.1266072Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1266216Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1266636Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1266737Z p_assert( 2023-01-11T22:54:21.1267084Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1267210Z return func(*args, **kwargs) 2023-01-11T22:54:21.1267553Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1267674Z traceback.print_stack() 2023-01-11T22:54:21.1268033Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1268137Z p_assert( 2023-01-11T22:54:21.1268464Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1268593Z traceback.print_stack() 2023-01-11T22:54:21.1268836Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1269077Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1269210Z File "", line 1, in 2023-01-11T22:54:21.1269321Z File "", line 1, in 2023-01-11T22:54:21.1269521Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1269665Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1269873Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1270019Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1270230Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1270378Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1270592Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1270680Z self.run() 2023-01-11T22:54:21.1270873Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1271031Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1271223Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1271372Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1271572Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1271678Z self.run() 2023-01-11T22:54:21.1272015Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1272144Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1272351Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1272485Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1272859Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1272973Z getattr(self, test_name)() 2023-01-11T22:54:21.1273295Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1273434Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1273776Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1273865Z fn() 2023-01-11T22:54:21.1274215Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1274392Z getattr(self, test_name)() 2023-01-11T22:54:21.1274770Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1274883Z test(self, **param_kwargs) 2023-01-11T22:54:21.1275289Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1275383Z fn() 2023-01-11T22:54:21.1275726Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1275859Z return func(*args, **kwargs) 2023-01-11T22:54:21.1276214Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1276338Z test(self, **param_kwargs) 2023-01-11T22:54:21.1276579Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1276698Z self.run_subtests( 2023-01-11T22:54:21.1277052Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1277179Z return func(*args, **kwargs) 2023-01-11T22:54:21.1277512Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1277682Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1277923Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1278038Z self.run_subtests( 2023-01-11T22:54:21.1278391Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1278545Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1278897Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1279063Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1279421Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1279544Z output = model(*input) 2023-01-11T22:54:21.1279898Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1280055Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1280383Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1280512Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1280888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1281011Z output = model(*input) 2023-01-11T22:54:21.1281370Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1281537Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1281864Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1281993Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1282365Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1282491Z _lazy_init(state, module) 2023-01-11T22:54:21.1282856Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1283029Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1283371Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1283586Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1283963Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1284087Z _lazy_init(state, module) 2023-01-11T22:54:21.1284517Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1284672Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1285033Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1285190Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1285527Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1285636Z return func(*args, **kwargs) 2023-01-11T22:54:21.1286040Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1286173Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1286552Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1286646Z p_assert( 2023-01-11T22:54:21.1286981Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1287095Z return func(*args, **kwargs) 2023-01-11T22:54:21.1287432Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1287541Z traceback.print_stack() 2023-01-11T22:54:21.1287908Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1288001Z p_assert( 2023-01-11T22:54:21.1288341Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1288457Z traceback.print_stack() 2023-01-11T22:54:21.1288697Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1288916Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1289036Z File "", line 1, in 2023-01-11T22:54:21.1289152Z File "", line 1, in 2023-01-11T22:54:21.1289363Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1289497Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1289695Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1289837Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1290026Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1290182Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1290368Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1290506Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1290708Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1290813Z self.run() 2023-01-11T22:54:21.1291024Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1291116Z self.run() 2023-01-11T22:54:21.1291304Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1291447Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1291639Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1291771Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1292122Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1292316Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1292655Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1292788Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1293409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1293595Z getattr(self, test_name)() 2023-01-11T22:54:21.1293964Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1294079Z getattr(self, test_name)() 2023-01-11T22:54:21.1294441Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1294530Z fn() 2023-01-11T22:54:21.1294879Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1294972Z fn() 2023-01-11T22:54:21.1295332Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1295437Z test(self, **param_kwargs) 2023-01-11T22:54:21.1295802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1295913Z test(self, **param_kwargs) 2023-01-11T22:54:21.1296264Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1296391Z return func(*args, **kwargs) 2023-01-11T22:54:21.1296747Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1296872Z return func(*args, **kwargs) 2023-01-11T22:54:21.1297123Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1297225Z self.run_subtests( 2023-01-11T22:54:21.1297468Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1297581Z self.run_subtests( 2023-01-11T22:54:21.1297935Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1298102Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1298448Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1298606Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1298967Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1299104Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1299460Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1299616Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1299992Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1300113Z output = model(*input) 2023-01-11T22:54:21.1300494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1300615Z output = model(*input) 2023-01-11T22:54:21.1300943Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1301068Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1301390Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1301532Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1302054Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1302232Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1302612Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1302789Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1303212Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1303327Z _lazy_init(state, module) 2023-01-11T22:54:21.1303702Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1303827Z _lazy_init(state, module) 2023-01-11T22:54:21.1304184Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1304358Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1304715Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1304884Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1305284Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1305431Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1305811Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1305954Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1306292Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1306419Z return func(*args, **kwargs) 2023-01-11T22:54:21.1306761Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1306888Z return func(*args, **kwargs) 2023-01-11T22:54:21.1307265Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1307370Z p_assert( 2023-01-11T22:54:21.1307726Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1307828Z p_assert( 2023-01-11T22:54:21.1308167Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1308295Z traceback.print_stack() 2023-01-11T22:54:21.1308628Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1308754Z traceback.print_stack() 2023-01-11T22:54:21.1308993Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1309236Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1309351Z File "", line 1, in 2023-01-11T22:54:21.1309480Z File "", line 1, in 2023-01-11T22:54:21.1309695Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1309841Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1310047Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1310200Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1310410Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1310534Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1310747Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1310852Z self.run() 2023-01-11T22:54:21.1311134Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1311288Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1311495Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1311644Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1311857Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1311988Z self.run() 2023-01-11T22:54:21.1312350Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1312487Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1312692Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1312841Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1313207Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1313335Z getattr(self, test_name)() 2023-01-11T22:54:21.1313656Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1313792Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1314155Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1314254Z fn() 2023-01-11T22:54:21.1314617Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1314742Z getattr(self, test_name)() 2023-01-11T22:54:21.1315108Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1315231Z test(self, **param_kwargs) 2023-01-11T22:54:21.1315573Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1315676Z fn() 2023-01-11T22:54:21.1316027Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1316153Z return func(*args, **kwargs) 2023-01-11T22:54:21.1316522Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1316647Z test(self, **param_kwargs) 2023-01-11T22:54:21.1316898Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1317015Z self.run_subtests( 2023-01-11T22:54:21.1317360Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1317487Z return func(*args, **kwargs) 2023-01-11T22:54:21.1317841Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1318007Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1318260Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1318374Z self.run_subtests( 2023-01-11T22:54:21.1318737Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1318893Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1319230Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1319394Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1319769Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1319892Z output = model(*input) 2023-01-11T22:54:21.1320254Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1320472Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1320807Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1320947Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1321304Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1321472Z output = model(*input) 2023-01-11T22:54:21.1321863Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1322040Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1322366Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1322504Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1322876Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1323000Z _lazy_init(state, module) 2023-01-11T22:54:21.1323353Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1323528Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1323890Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1324059Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1324427Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1324551Z _lazy_init(state, module) 2023-01-11T22:54:21.1324952Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1325101Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1325454Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1325605Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1325944Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1326074Z return func(*args, **kwargs) 2023-01-11T22:54:21.1326471Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1326614Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1326993Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1327097Z p_assert( 2023-01-11T22:54:21.1327428Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1327540Z return func(*args, **kwargs) 2023-01-11T22:54:21.1327875Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1328002Z traceback.print_stack() 2023-01-11T22:54:21.1328376Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1328481Z p_assert( 2023-01-11T22:54:21.1328811Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1328937Z traceback.print_stack() 2023-01-11T22:54:21.1329156Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1329394Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1329525Z File "", line 1, in 2023-01-11T22:54:21.1329719Z File "", line 1, in 2023-01-11T22:54:21.1329930Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1330076Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1330279Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1330433Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1330666Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1330815Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1331031Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1331136Z self.run() 2023-01-11T22:54:21.1331339Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1331493Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1331695Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1331827Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1332040Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1332145Z self.run() 2023-01-11T22:54:21.1332500Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1332636Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1332848Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1333173Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1333547Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1333655Z getattr(self, test_name)() 2023-01-11T22:54:21.1333996Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1334130Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1334501Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1334602Z fn() 2023-01-11T22:54:21.1334961Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1335086Z getattr(self, test_name)() 2023-01-11T22:54:21.1335454Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1335560Z test(self, **param_kwargs) 2023-01-11T22:54:21.1335916Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1336016Z fn() 2023-01-11T22:54:21.1336367Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1336496Z return func(*args, **kwargs) 2023-01-11T22:54:21.1336866Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1336991Z test(self, **param_kwargs) 2023-01-11T22:54:21.1337240Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1337339Z self.run_subtests( 2023-01-11T22:54:21.1337703Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1337835Z return func(*args, **kwargs) 2023-01-11T22:54:21.1338189Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1338354Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1338605Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1338719Z self.run_subtests( 2023-01-11T22:54:21.1339186Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1339323Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1339673Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1339836Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1340273Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1340403Z output = model(*input) 2023-01-11T22:54:21.1340772Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1340927Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1341253Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1341378Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1341749Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1341869Z output = model(*input) 2023-01-11T22:54:21.1342247Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1342427Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1342756Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1342897Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1343264Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1343370Z _lazy_init(state, module) 2023-01-11T22:54:21.1343743Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1343923Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1344276Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1344445Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1344813Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1344937Z _lazy_init(state, module) 2023-01-11T22:54:21.1345337Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1345464Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1345821Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1345996Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1346336Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1346463Z return func(*args, **kwargs) 2023-01-11T22:54:21.1346861Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1347008Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1347385Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1347472Z p_assert( 2023-01-11T22:54:21.1347806Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1347934Z return func(*args, **kwargs) 2023-01-11T22:54:21.1348270Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1348464Z traceback.print_stack() 2023-01-11T22:54:21.1348846Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1348951Z p_assert( 2023-01-11T22:54:21.1349289Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1349398Z traceback.print_stack() 2023-01-11T22:54:21.1349685Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1349925Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1350058Z File "", line 1, in 2023-01-11T22:54:21.1350188Z File "", line 1, in 2023-01-11T22:54:21.1350401Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1350547Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1350755Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1350891Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1351100Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1351243Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1351456Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1351566Z self.run() 2023-01-11T22:54:21.1351769Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1351921Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1352105Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1352255Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1352467Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1352574Z self.run() 2023-01-11T22:54:21.1352928Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1353064Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1353269Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1353418Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1353765Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1353893Z getattr(self, test_name)() 2023-01-11T22:54:21.1354232Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1354365Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1354724Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1354826Z fn() 2023-01-11T22:54:21.1355185Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1355313Z getattr(self, test_name)() 2023-01-11T22:54:21.1355686Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1355811Z test(self, **param_kwargs) 2023-01-11T22:54:21.1356174Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1356275Z fn() 2023-01-11T22:54:21.1356623Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1356748Z return func(*args, **kwargs) 2023-01-11T22:54:21.1357115Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1357240Z test(self, **param_kwargs) 2023-01-11T22:54:21.1357473Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1357651Z self.run_subtests( 2023-01-11T22:54:21.1358017Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1358143Z return func(*args, **kwargs) 2023-01-11T22:54:21.1358495Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1358709Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1358967Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1359083Z self.run_subtests( 2023-01-11T22:54:21.1359435Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1359589Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1359944Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1360103Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1360478Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1360600Z output = model(*input) 2023-01-11T22:54:21.1360965Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1361121Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1361429Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1361571Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1361947Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1362072Z output = model(*input) 2023-01-11T22:54:21.1362452Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1362629Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1362958Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1363098Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1363450Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1363575Z _lazy_init(state, module) 2023-01-11T22:54:21.1363949Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1364124Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1364478Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1364652Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1365020Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1365144Z _lazy_init(state, module) 2023-01-11T22:54:21.1365526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1365672Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1366027Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1366197Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1366537Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1366664Z return func(*args, **kwargs) 2023-01-11T22:54:21.1367134Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1367278Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1367638Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1367743Z p_assert( 2023-01-11T22:54:21.1368143Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1368278Z return func(*args, **kwargs) 2023-01-11T22:54:21.1368623Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1368753Z traceback.print_stack() 2023-01-11T22:54:21.1369125Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1369227Z p_assert( 2023-01-11T22:54:21.1369544Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1369671Z traceback.print_stack() 2023-01-11T22:54:21.1369910Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1370146Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1370277Z File "", line 1, in 2023-01-11T22:54:21.1370413Z File "", line 1, in 2023-01-11T22:54:21.1370624Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1370769Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1370962Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1371102Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1371304Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1371460Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1371657Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1371808Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1372021Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1372109Z self.run() 2023-01-11T22:54:21.1372324Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1372428Z self.run() 2023-01-11T22:54:21.1372627Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1372776Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1373167Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1373319Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1373670Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1373794Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1374133Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1374267Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1374631Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1374760Z getattr(self, test_name)() 2023-01-11T22:54:21.1375122Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1375245Z getattr(self, test_name)() 2023-01-11T22:54:21.1375604Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1375688Z fn() 2023-01-11T22:54:21.1376048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1376231Z fn() 2023-01-11T22:54:21.1376607Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1376731Z test(self, **param_kwargs) 2023-01-11T22:54:21.1377094Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1377279Z test(self, **param_kwargs) 2023-01-11T22:54:21.1377634Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1377761Z return func(*args, **kwargs) 2023-01-11T22:54:21.1378117Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1378241Z return func(*args, **kwargs) 2023-01-11T22:54:21.1378493Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1378613Z self.run_subtests( 2023-01-11T22:54:21.1378864Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1378979Z self.run_subtests( 2023-01-11T22:54:21.1379314Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1379482Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1379828Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1379988Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1380352Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1380508Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1380863Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1381020Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1381394Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1381500Z output = model(*input) 2023-01-11T22:54:21.1381877Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1381996Z output = model(*input) 2023-01-11T22:54:21.1382326Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1382466Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1382785Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1382924Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1383304Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1383462Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1383836Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1384008Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1384376Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1384500Z _lazy_init(state, module) 2023-01-11T22:54:21.1384867Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1384989Z _lazy_init(state, module) 2023-01-11T22:54:21.1385341Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1385558Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1385920Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1386088Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1386535Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1386697Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1387102Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1387245Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1387582Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1387691Z return func(*args, **kwargs) 2023-01-11T22:54:21.1388034Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1388160Z return func(*args, **kwargs) 2023-01-11T22:54:21.1388537Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1388642Z p_assert( 2023-01-11T22:54:21.1389021Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1389125Z p_assert( 2023-01-11T22:54:21.1389463Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1389575Z traceback.print_stack() 2023-01-11T22:54:21.1389910Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1390036Z traceback.print_stack() 2023-01-11T22:54:21.1390273Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1390514Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1390646Z File "", line 1, in 2023-01-11T22:54:21.1390775Z File "", line 1, in 2023-01-11T22:54:21.1390984Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1391112Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1391316Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1391470Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1391679Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1391822Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1392036Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1392140Z self.run() 2023-01-11T22:54:21.1392330Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1392484Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1392688Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1392837Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1393052Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1393158Z self.run() 2023-01-11T22:54:21.1393509Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1393645Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1393830Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1393978Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1394341Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1394530Z getattr(self, test_name)() 2023-01-11T22:54:21.1394881Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1395015Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1395376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1395459Z fn() 2023-01-11T22:54:21.1395867Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1395998Z getattr(self, test_name)() 2023-01-11T22:54:21.1396371Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1396494Z test(self, **param_kwargs) 2023-01-11T22:54:21.1396853Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1396957Z fn() 2023-01-11T22:54:21.1397310Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1397419Z return func(*args, **kwargs) 2023-01-11T22:54:21.1397787Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1397911Z test(self, **param_kwargs) 2023-01-11T22:54:21.1398167Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1398282Z self.run_subtests( 2023-01-11T22:54:21.1398645Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1398770Z return func(*args, **kwargs) 2023-01-11T22:54:21.1399123Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1399274Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1399525Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1399639Z self.run_subtests( 2023-01-11T22:54:21.1400004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1400157Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1400510Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1400673Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1401050Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1401152Z output = model(*input) 2023-01-11T22:54:21.1401518Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1401674Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1401998Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1402140Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1402515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1402642Z output = model(*input) 2023-01-11T22:54:21.1403018Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1403178Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1403505Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1403645Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1404088Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1404211Z _lazy_init(state, module) 2023-01-11T22:54:21.1404586Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1404762Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1405164Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1405342Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1405697Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1405824Z _lazy_init(state, module) 2023-01-11T22:54:21.1406220Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1406370Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1406721Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1406891Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1407228Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1407359Z return func(*args, **kwargs) 2023-01-11T22:54:21.1407740Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1407884Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1408264Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1408370Z p_assert( 2023-01-11T22:54:21.1408704Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1408832Z return func(*args, **kwargs) 2023-01-11T22:54:21.1409169Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1409296Z traceback.print_stack() 2023-01-11T22:54:21.1409652Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1409759Z p_assert( 2023-01-11T22:54:21.1410097Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1410224Z traceback.print_stack() 2023-01-11T22:54:21.1410463Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1410700Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1411454Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1412193Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1413096Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1413837Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1414725Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1415466Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1416210Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1416944Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1417081Z File "", line 1, in 2023-01-11T22:54:21.1417192Z File "", line 1, in 2023-01-11T22:54:21.1417411Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1417556Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1417763Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1417919Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1418132Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1418276Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1418473Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1418582Z self.run() 2023-01-11T22:54:21.1418790Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1418943Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1419143Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1419289Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1419502Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1419606Z self.run() 2023-01-11T22:54:21.1419933Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1420074Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1420278Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1420425Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1420793Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1420922Z getattr(self, test_name)() 2023-01-11T22:54:21.1421261Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1421394Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1421736Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1421835Z fn() 2023-01-11T22:54:21.1422196Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1422389Z getattr(self, test_name)() 2023-01-11T22:54:21.1422766Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1422892Z test(self, **param_kwargs) 2023-01-11T22:54:21.1423248Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1423347Z fn() 2023-01-11T22:54:21.1423737Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1423871Z return func(*args, **kwargs) 2023-01-11T22:54:21.1424246Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1424372Z test(self, **param_kwargs) 2023-01-11T22:54:21.1424625Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1424746Z self.run_subtests( 2023-01-11T22:54:21.1425105Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1425213Z return func(*args, **kwargs) 2023-01-11T22:54:21.1425566Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1425729Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1425983Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1426098Z self.run_subtests( 2023-01-11T22:54:21.1426459Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1426615Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1426966Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1427133Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1427494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1427621Z output = model(*input) 2023-01-11T22:54:21.1427998Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1428184Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1428525Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1428669Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1429045Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1429168Z output = model(*input) 2023-01-11T22:54:21.1429546Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1429710Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1430041Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1430182Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1430557Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1430681Z _lazy_init(state, module) 2023-01-11T22:54:21.1431052Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1431226Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1431580Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1431796Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1432173Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1432297Z _lazy_init(state, module) 2023-01-11T22:54:21.1432699Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1432905Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1433277Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1433448Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1433788Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1433898Z return func(*args, **kwargs) 2023-01-11T22:54:21.1434296Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1434442Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1434824Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1434931Z p_assert( 2023-01-11T22:54:21.1435267Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1435397Z return func(*args, **kwargs) 2023-01-11T22:54:21.1435735Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1435847Z traceback.print_stack() 2023-01-11T22:54:21.1436223Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1436325Z p_assert( 2023-01-11T22:54:21.1436660Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1436791Z traceback.print_stack() 2023-01-11T22:54:21.1437030Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1437267Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1437399Z File "", line 1, in 2023-01-11T22:54:21.1437510Z File "", line 1, in 2023-01-11T22:54:21.1437725Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1437870Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1438072Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1438223Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1438434Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1438576Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1438794Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1438882Z self.run() 2023-01-11T22:54:21.1439083Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1439235Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1439435Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1439586Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1439800Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1439904Z self.run() 2023-01-11T22:54:21.1440233Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1440369Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1440574Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1440784Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1441161Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1441288Z getattr(self, test_name)() 2023-01-11T22:54:21.1441627Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1441763Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1442189Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1442299Z fn() 2023-01-11T22:54:21.1442674Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1442839Z getattr(self, test_name)() 2023-01-11T22:54:21.1443212Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1443342Z test(self, **param_kwargs) 2023-01-11T22:54:21.1443703Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1443801Z fn() 2023-01-11T22:54:21.1444139Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1444266Z return func(*args, **kwargs) 2023-01-11T22:54:21.1444637Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1444762Z test(self, **param_kwargs) 2023-01-11T22:54:21.1445014Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1445128Z self.run_subtests( 2023-01-11T22:54:21.1445488Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1445596Z return func(*args, **kwargs) 2023-01-11T22:54:21.1445953Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1446113Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1446365Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1446479Z self.run_subtests( 2023-01-11T22:54:21.1446846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1447002Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1447350Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1447510Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1447867Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1447994Z output = model(*input) 2023-01-11T22:54:21.1448359Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1448513Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1448840Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1448984Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1449359Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1449478Z output = model(*input) 2023-01-11T22:54:21.1449839Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1450015Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1450342Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1450544Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1450919Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1451042Z _lazy_init(state, module) 2023-01-11T22:54:21.1451464Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1451645Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1451991Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1452163Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1452530Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1452659Z _lazy_init(state, module) 2023-01-11T22:54:21.1453225Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1453377Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1453737Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1453908Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1454233Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1454363Z return func(*args, **kwargs) 2023-01-11T22:54:21.1454758Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1454900Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1455280Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1455392Z p_assert( 2023-01-11T22:54:21.1455725Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1455852Z return func(*args, **kwargs) 2023-01-11T22:54:21.1456192Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1456324Z traceback.print_stack() 2023-01-11T22:54:21.1456703Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1456808Z p_assert( 2023-01-11T22:54:21.1457142Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1457269Z traceback.print_stack() 2023-01-11T22:54:21.1457510Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1457751Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1457864Z File "", line 1, in 2023-01-11T22:54:21.1458075Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1458221Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1458425Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1458581Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1458793Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1458898Z self.run() 2023-01-11T22:54:21.1459083Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1459232Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1459573Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1459808Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1460180Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1460308Z getattr(self, test_name)() 2023-01-11T22:54:21.1460665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1460767Z fn() 2023-01-11T22:54:21.1461172Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1461308Z test(self, **param_kwargs) 2023-01-11T22:54:21.1461674Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1461801Z return func(*args, **kwargs) 2023-01-11T22:54:21.1461934Z File "", line 1, in 2023-01-11T22:54:21.1462187Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1462307Z self.run_subtests( 2023-01-11T22:54:21.1462657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1462804Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1463018Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1463164Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1463527Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1463683Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1463889Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1464042Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1464419Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1464527Z output = model(*input) 2023-01-11T22:54:21.1464741Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1464846Z self.run() 2023-01-11T22:54:21.1465177Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1465319Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1465531Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1465680Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1466058Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1466214Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1466553Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1466692Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1467058Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1467183Z _lazy_init(state, module) 2023-01-11T22:54:21.1467546Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1467670Z getattr(self, test_name)() 2023-01-11T22:54:21.1468027Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1468180Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1468541Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1468640Z fn() 2023-01-11T22:54:21.1469039Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1469248Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1469622Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1469748Z test(self, **param_kwargs) 2023-01-11T22:54:21.1470089Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1470253Z return func(*args, **kwargs) 2023-01-11T22:54:21.1470626Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1470756Z return func(*args, **kwargs) 2023-01-11T22:54:21.1471138Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1471244Z p_assert( 2023-01-11T22:54:21.1471496Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1471617Z self.run_subtests( 2023-01-11T22:54:21.1471955Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1472065Z traceback.print_stack() 2023-01-11T22:54:21.1472418Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1472584Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1472948Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1473104Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1473481Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1473601Z output = model(*input) 2023-01-11T22:54:21.1473926Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1474052Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1474429Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1474607Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1474978Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1475101Z _lazy_init(state, module) 2023-01-11T22:54:21.1475455Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1475622Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1476019Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1476150Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1476491Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1476619Z return func(*args, **kwargs) 2023-01-11T22:54:21.1476997Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1477105Z p_assert( 2023-01-11T22:54:21.1477444Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1477571Z traceback.print_stack() 2023-01-11T22:54:21.1477809Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1478027Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1478778Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1479629Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1479769Z File "", line 1, in 2023-01-11T22:54:21.1479900Z File "", line 1, in 2023-01-11T22:54:21.1480119Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1480265Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1480469Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1480621Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1480818Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1480960Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1481177Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1481283Z self.run() 2023-01-11T22:54:21.1481486Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1481640Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1481842Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1481990Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1482187Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1482291Z self.run() 2023-01-11T22:54:21.1482642Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1482782Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1482986Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1483134Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1483498Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1483605Z getattr(self, test_name)() 2023-01-11T22:54:21.1483950Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1484086Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1484452Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1484551Z fn() 2023-01-11T22:54:21.1484913Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1485037Z getattr(self, test_name)() 2023-01-11T22:54:21.1485407Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1485513Z test(self, **param_kwargs) 2023-01-11T22:54:21.1485873Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1485973Z fn() 2023-01-11T22:54:21.1486333Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1486463Z return func(*args, **kwargs) 2023-01-11T22:54:21.1486830Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1486954Z test(self, **param_kwargs) 2023-01-11T22:54:21.1487204Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1487302Z self.run_subtests( 2023-01-11T22:54:21.1487734Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1487862Z return func(*args, **kwargs) 2023-01-11T22:54:21.1488218Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1488386Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1488684Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 183, in test_delayed_optim_step 2023-01-11T22:54:21.1488806Z self.run_subtests( 2023-01-11T22:54:21.1489175Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1489312Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1489663Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1489830Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1490207Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1490329Z output = model(*input) 2023-01-11T22:54:21.1490693Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1490846Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1491174Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1491298Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1491669Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1491789Z output = model(*input) 2023-01-11T22:54:21.1492170Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1492351Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1492680Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1492821Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1493397Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1493509Z _lazy_init(state, module) 2023-01-11T22:54:21.1493887Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1494062Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1494420Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1494591Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1494964Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1495087Z _lazy_init(state, module) 2023-01-11T22:54:21.1495487Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1495631Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1495967Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1496135Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1496472Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1496598Z return func(*args, **kwargs) 2023-01-11T22:54:21.1496996Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1497228Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1497615Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1497722Z p_assert( 2023-01-11T22:54:21.1498036Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1498163Z return func(*args, **kwargs) 2023-01-11T22:54:21.1498700Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1498841Z traceback.print_stack() 2023-01-11T22:54:21.1499226Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1499330Z p_assert( 2023-01-11T22:54:21.1499667Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1499778Z traceback.print_stack() 2023-01-11T22:54:21.1500018Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1500256Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1500484Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1500720Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1500958Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1501189Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1501414Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1501638Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1501848Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1502084Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1502312Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1502536Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1502764Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1502991Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1503219Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1503445Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1503651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1503877Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1504104Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1504331Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1505096Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1505831Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1506648Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1507434Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1508174Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1508896Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1509928Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:224: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.1510301Z local_num_valid_indices = torch.tensor([num_valid_indices], **tensor_kwargs) # type: ignore[arg-type, call-overload] 2023-01-11T22:54:21.1510536Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1510775Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1511007Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1511239Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1511448Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1511676Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1511904Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1512137Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1512363Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1512585Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1512816Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1513043Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1513250Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1513478Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1513703Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1513929Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1514156Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1514381Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1514670Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1514893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1515102Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1515329Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1515596Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1515826Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1516587Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1517338Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1518070Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1518808Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1519537Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1520272Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1520993Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1521733Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1522456Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1523191Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1523980Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1524757Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1525488Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1526220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1526945Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1527673Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1528403Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1529135Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1529857Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1530590Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1531310Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1532036Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1532331Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1532568Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1532803Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1533188Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1533311Z dist init r=1, world=2 2023-01-11T22:54:21.1533715Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1534057Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1534372Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1534684Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1535008Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1535326Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1535629Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1535955Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1536272Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1536562Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1536884Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1537193Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1537306Z dist init r=0, world=2 2023-01-11T22:54:21.1537615Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1537931Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1538235Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1538541Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1538858Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1539167Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1539472Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1539838Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1540147Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1540493Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1540814Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1541126Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1541235Z ok (35.373s) 2023-01-11T22:54:21.1541457Z test_delayed_reduce_scatter_offload_false_no_shard (__main__.TestParityWithDDP) 2023-01-11T22:54:21.1541774Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83765 2023-01-11T22:54:21.1542075Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83766 2023-01-11T22:54:21.1542473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.1542634Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.1543019Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.1543213Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.1543580Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.1543760Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.1544136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.1544328Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.1544579Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.1544805Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.1545206Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.1545599Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.1545835Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.1546064Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.1546301Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1546536Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1547566Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.1547681Z warnings.warn( 2023-01-11T22:54:21.1548686Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.1548861Z warnings.warn( 2023-01-11T22:54:21.1549144Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1549366Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1549601Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1549833Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1550066Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1550295Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1550523Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1550752Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1550982Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1551189Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1551411Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1551639Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1551867Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1552097Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1552322Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1552545Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1553314Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1554059Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1554794Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1555538Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1556262Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1557076Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1557851Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1558605Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1559343Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1560080Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1560801Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1561529Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1562258Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1562987Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1563707Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1564446Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1565170Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1565882Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1566709Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1567451Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1568174Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1568910Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1569630Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1570361Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1571091Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1571817Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1572538Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1573443Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1574176Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1574904Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1575776Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1576523Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1577249Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1577987Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1578707Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1579427Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1580147Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1580875Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1581592Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1582323Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1583042Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1583775Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1584621Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1585356Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1585595Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1585837Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1586071Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1586304Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1586535Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1586763Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1586972Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1587203Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1587430Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1587655Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1587883Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1588111Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1588337Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1588567Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1588790Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1588993Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1589220Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1589447Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1589678Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1589903Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1590127Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1590352Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1590581Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1590789Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1591530Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1592327Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1593106Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1593845Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1594582Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1595305Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1596036Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1596766Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1597497Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1598220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1598957Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1599682Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1600408Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1601198Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1601958Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1602686Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1603426Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1604151Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1604878Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1605605Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1606335Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1607055Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1607781Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1608501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1609226Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1609947Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1610772Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1611508Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1612235Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1613133Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1613874Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1614594Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1615330Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1616050Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1616775Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1617503Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1618230Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1618954Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1619830Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1620570Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1621299Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1622030Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1622753Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1623475Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1624207Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1624931Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1625654Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1626377Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1627105Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1627823Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1628658Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1629391Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1630112Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1630839Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1631563Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1632282Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1633018Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1633734Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1634459Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1635184Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1635908Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1636625Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1636920Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1637161Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1637414Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1637651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1637885Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1638115Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1638229Z dist init r=1, world=2 2023-01-11T22:54:21.1638345Z dist init r=0, world=2 2023-01-11T22:54:21.1638447Z ok (5.014s) 2023-01-11T22:54:21.1638663Z test_delayed_reduce_scatter_offload_false_none (__main__.TestParityWithDDP) 2023-01-11T22:54:21.1639569Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82704 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T22:54:21.1639798Z test_delayed_reduce_scatter_offload_false_shard_grad_op (__main__.TestParityWithDDP) 2023-01-11T22:54:21.1640683Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82398 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T22:54:21.1640907Z test_delayed_reduce_scatter_offload_true_no_shard (__main__.TestParityWithDDP) 2023-01-11T22:54:21.1641222Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83848 2023-01-11T22:54:21.1641442Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83849 2023-01-11T22:54:21.1641821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.1642007Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.1642391Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.1642584Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.1642951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.1643113Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.1643489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.1643680Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.1643933Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.1644182Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.1644582Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.1644977Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.1645278Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.1645503Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.1645721Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1645956Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1647033Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.1647159Z warnings.warn( 2023-01-11T22:54:21.1648180Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.1648298Z warnings.warn( 2023-01-11T22:54:21.1648432Z File "", line 1, in 2023-01-11T22:54:21.1648649Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1648794Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1649003Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1649137Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1649355Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1649462Z self.run() 2023-01-11T22:54:21.1649663Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1649811Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1650159Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1650294Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1650662Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1650772Z getattr(self, test_name)() 2023-01-11T22:54:21.1651135Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1651236Z fn() 2023-01-11T22:54:21.1651606Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1651737Z test(self, **param_kwargs) 2023-01-11T22:54:21.1652099Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1652225Z return func(*args, **kwargs) 2023-01-11T22:54:21.1652480Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1652577Z self.run_subtests( 2023-01-11T22:54:21.1653100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1653274Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1653650Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1653806Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1654186Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1654394Z output = model(*input) 2023-01-11T22:54:21.1654728Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1654851Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1655232Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1655469Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1655858Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1655994Z _lazy_init(state, module) 2023-01-11T22:54:21.1656351Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1656520Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1656903Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1657054Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1657419Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1657546Z return func(*args, **kwargs) 2023-01-11T22:54:21.1657930Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1658037Z p_assert( 2023-01-11T22:54:21.1658377Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1658506Z traceback.print_stack() 2023-01-11T22:54:21.1658619Z File "", line 1, in 2023-01-11T22:54:21.1658832Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1658979Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1659187Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1659341Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1659555Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1659661Z self.run() 2023-01-11T22:54:21.1659844Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1659998Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1660345Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1660479Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1660840Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1660965Z getattr(self, test_name)() 2023-01-11T22:54:21.1661325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1661431Z fn() 2023-01-11T22:54:21.1661780Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1661909Z test(self, **param_kwargs) 2023-01-11T22:54:21.1662269Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1662401Z return func(*args, **kwargs) 2023-01-11T22:54:21.1662655Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1662771Z self.run_subtests( 2023-01-11T22:54:21.1663124Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1663288Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1663634Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1663857Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1664243Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1664365Z output = model(*input) 2023-01-11T22:54:21.1664693Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1664886Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1665278Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1665455Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1665802Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1665928Z _lazy_init(state, module) 2023-01-11T22:54:21.1666290Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1666460Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1666860Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1667004Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1667348Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1667477Z return func(*args, **kwargs) 2023-01-11T22:54:21.1667856Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1667943Z p_assert( 2023-01-11T22:54:21.1668277Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1668410Z traceback.print_stack() 2023-01-11T22:54:21.1668651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1668889Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1669023Z File "", line 1, in 2023-01-11T22:54:21.1669234Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1669363Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1669567Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1669721Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1669936Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1670042Z self.run() 2023-01-11T22:54:21.1670246Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1670395Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1670740Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1670857Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1671223Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1671350Z getattr(self, test_name)() 2023-01-11T22:54:21.1671715Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1671815Z fn() 2023-01-11T22:54:21.1672181Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1672306Z test(self, **param_kwargs) 2023-01-11T22:54:21.1672664Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1672772Z return func(*args, **kwargs) 2023-01-11T22:54:21.1673089Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1673205Z self.run_subtests( 2023-01-11T22:54:21.1673563Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1673726Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1674139Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1674301Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1674680Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1674783Z output = model(*input) 2023-01-11T22:54:21.1675113Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1675258Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1675639Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1675816Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1676181Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1676307Z _lazy_init(state, module) 2023-01-11T22:54:21.1676659Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1676811Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1677210Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1677353Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1677692Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1677850Z return func(*args, **kwargs) 2023-01-11T22:54:21.1678230Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1678336Z p_assert( 2023-01-11T22:54:21.1678675Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1678787Z traceback.print_stack() 2023-01-11T22:54:21.1678921Z File "", line 1, in 2023-01-11T22:54:21.1679133Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1679281Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1679485Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1679638Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1679853Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1679944Z self.run() 2023-01-11T22:54:21.1680148Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1680295Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1680640Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1680778Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1681142Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1681266Z getattr(self, test_name)() 2023-01-11T22:54:21.1681625Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1681707Z fn() 2023-01-11T22:54:21.1682074Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1682267Z test(self, **param_kwargs) 2023-01-11T22:54:21.1682634Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1682760Z return func(*args, **kwargs) 2023-01-11T22:54:21.1683012Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1683127Z self.run_subtests( 2023-01-11T22:54:21.1683542Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1683695Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1684068Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1684224Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1684603Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1684733Z output = model(*input) 2023-01-11T22:54:21.1685062Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1685204Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1685584Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1685745Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1686113Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1686233Z _lazy_init(state, module) 2023-01-11T22:54:21.1686589Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1686760Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1687163Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1687309Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1687650Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1687757Z return func(*args, **kwargs) 2023-01-11T22:54:21.1688143Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1688251Z p_assert( 2023-01-11T22:54:21.1688590Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1688719Z traceback.print_stack() 2023-01-11T22:54:21.1688958Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1689197Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1689333Z File "", line 1, in 2023-01-11T22:54:21.1689526Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1689667Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1689868Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1690022Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1690240Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1690347Z self.run() 2023-01-11T22:54:21.1690547Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1690696Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1691023Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1691159Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1691592Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1691716Z getattr(self, test_name)() 2023-01-11T22:54:21.1692078Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1692179Z fn() 2023-01-11T22:54:21.1692594Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1692728Z test(self, **param_kwargs) 2023-01-11T22:54:21.1693243Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1693372Z return func(*args, **kwargs) 2023-01-11T22:54:21.1693628Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1693748Z self.run_subtests( 2023-01-11T22:54:21.1694110Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1694276Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1694639Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1694791Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1695150Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1695274Z output = model(*input) 2023-01-11T22:54:21.1695599Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1695740Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1696116Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1696289Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1696662Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1696785Z _lazy_init(state, module) 2023-01-11T22:54:21.1697118Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1697289Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1697692Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1697838Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1698176Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1698300Z return func(*args, **kwargs) 2023-01-11T22:54:21.1698679Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1698788Z p_assert( 2023-01-11T22:54:21.1699106Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1699235Z traceback.print_stack() 2023-01-11T22:54:21.1699365Z File "", line 1, in 2023-01-11T22:54:21.1699575Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1699719Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1699920Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1700069Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1700266Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1700372Z self.run() 2023-01-11T22:54:21.1700576Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1700822Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1701171Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1701303Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1701663Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1701787Z getattr(self, test_name)() 2023-01-11T22:54:21.1702189Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1702301Z fn() 2023-01-11T22:54:21.1702676Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1702804Z test(self, **param_kwargs) 2023-01-11T22:54:21.1703161Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1703293Z return func(*args, **kwargs) 2023-01-11T22:54:21.1703548Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1703664Z self.run_subtests( 2023-01-11T22:54:21.1703999Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1704164Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1704530Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1704684Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1705061Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1705181Z output = model(*input) 2023-01-11T22:54:21.1705508Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1705651Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1706008Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1706183Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1706550Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1706676Z _lazy_init(state, module) 2023-01-11T22:54:21.1707031Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1707202Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1707602Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1707748Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1708073Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1708201Z return func(*args, **kwargs) 2023-01-11T22:54:21.1708579Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1708683Z p_assert( 2023-01-11T22:54:21.1709023Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1709151Z traceback.print_stack() 2023-01-11T22:54:21.1709389Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1709625Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1709738Z File "", line 1, in 2023-01-11T22:54:21.1709948Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1710092Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1710359Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1710514Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1710730Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1710833Z self.run() 2023-01-11T22:54:21.1711035Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1711210Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1711570Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1711705Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1712069Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1712194Z getattr(self, test_name)() 2023-01-11T22:54:21.1712555Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1712658Z fn() 2023-01-11T22:54:21.1713005Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1713129Z test(self, **param_kwargs) 2023-01-11T22:54:21.1713482Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1713613Z return func(*args, **kwargs) 2023-01-11T22:54:21.1713868Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1713982Z self.run_subtests( 2023-01-11T22:54:21.1714339Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1714504Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1714853Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1715012Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1715388Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1715509Z output = model(*input) 2023-01-11T22:54:21.1715835Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1715977Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1716351Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1716527Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1716894Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1717000Z _lazy_init(state, module) 2023-01-11T22:54:21.1717357Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1717523Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1717922Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1718066Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1718405Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1718530Z return func(*args, **kwargs) 2023-01-11T22:54:21.1718910Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1718996Z p_assert( 2023-01-11T22:54:21.1719332Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1719533Z traceback.print_stack() 2023-01-11T22:54:21.1719662Z File "", line 1, in 2023-01-11T22:54:21.1719874Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1720018Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1720222Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1720356Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1720654Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1720763Z self.run() 2023-01-11T22:54:21.1720971Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1721119Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1721471Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1721607Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1721973Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1722079Z getattr(self, test_name)() 2023-01-11T22:54:21.1722440Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1722540Z fn() 2023-01-11T22:54:21.1722907Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1723032Z test(self, **param_kwargs) 2023-01-11T22:54:21.1723394Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1723525Z return func(*args, **kwargs) 2023-01-11T22:54:21.1723778Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1723875Z self.run_subtests( 2023-01-11T22:54:21.1724233Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1724397Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1724758Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1724913Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1725290Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1725413Z output = model(*input) 2023-01-11T22:54:21.1725742Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1725864Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1726240Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1726421Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1726789Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1726912Z _lazy_init(state, module) 2023-01-11T22:54:21.1727266Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1727438Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1727838Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1727965Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1728306Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1728433Z return func(*args, **kwargs) 2023-01-11T22:54:21.1728811Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1728982Z p_assert( 2023-01-11T22:54:21.1729326Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1729456Z traceback.print_stack() 2023-01-11T22:54:21.1729694Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1729962Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1730101Z File "", line 1, in 2023-01-11T22:54:21.1730315Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1730460Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1730664Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1730817Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1731036Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1731142Z self.run() 2023-01-11T22:54:21.1731327Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1731475Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1731826Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1731965Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1732332Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1732458Z getattr(self, test_name)() 2023-01-11T22:54:21.1732822Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1733057Z fn() 2023-01-11T22:54:21.1733440Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1733571Z test(self, **param_kwargs) 2023-01-11T22:54:21.1733924Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1734050Z return func(*args, **kwargs) 2023-01-11T22:54:21.1734306Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1734420Z self.run_subtests( 2023-01-11T22:54:21.1734778Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1734924Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1735286Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1735444Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1735819Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1735945Z output = model(*input) 2023-01-11T22:54:21.1736270Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1736411Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1736785Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1736943Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1737312Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1737436Z _lazy_init(state, module) 2023-01-11T22:54:21.1737789Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1737959Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1738454Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1738601Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1738940Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1739066Z return func(*args, **kwargs) 2023-01-11T22:54:21.1739487Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1739601Z p_assert( 2023-01-11T22:54:21.1739946Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1740074Z traceback.print_stack() 2023-01-11T22:54:21.1740206Z File "", line 1, in 2023-01-11T22:54:21.1740420Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1740567Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1740752Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1740904Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1741120Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1741225Z self.run() 2023-01-11T22:54:21.1741431Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1741579Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1741921Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1742057Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1742399Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1742525Z getattr(self, test_name)() 2023-01-11T22:54:21.1742890Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1742991Z fn() 2023-01-11T22:54:21.1743391Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1743518Z test(self, **param_kwargs) 2023-01-11T22:54:21.1743881Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1744010Z return func(*args, **kwargs) 2023-01-11T22:54:21.1744246Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1744363Z self.run_subtests( 2023-01-11T22:54:21.1744720Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1744884Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1745253Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1745403Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1745775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1745896Z output = model(*input) 2023-01-11T22:54:21.1746210Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1746351Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1746733Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1746910Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1747279Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1747477Z _lazy_init(state, module) 2023-01-11T22:54:21.1747839Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1748012Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1748394Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1748596Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1748952Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1749078Z return func(*args, **kwargs) 2023-01-11T22:54:21.1749456Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1749561Z p_assert( 2023-01-11T22:54:21.1749896Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1750030Z traceback.print_stack() 2023-01-11T22:54:21.1750251Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1750490Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1750623Z File "", line 1, in 2023-01-11T22:54:21.1750835Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1750983Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1751189Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1751343Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1751558Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1751645Z self.run() 2023-01-11T22:54:21.1751849Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1752002Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1752342Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1752473Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1752837Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1752962Z getattr(self, test_name)() 2023-01-11T22:54:21.1753306Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1753409Z fn() 2023-01-11T22:54:21.1753771Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1753895Z test(self, **param_kwargs) 2023-01-11T22:54:21.1754246Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1754377Z return func(*args, **kwargs) 2023-01-11T22:54:21.1754717Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1754830Z self.run_subtests( 2023-01-11T22:54:21.1755165Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1755330Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1755698Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1755853Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1756225Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1756344Z output = model(*input) 2023-01-11T22:54:21.1756669Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1756878Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1757242Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1757420Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1757815Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1757990Z _lazy_init(state, module) 2023-01-11T22:54:21.1758356Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1758527Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1758927Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1759072Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1759418Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1759526Z return func(*args, **kwargs) 2023-01-11T22:54:21.1759904Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1760009Z p_assert( 2023-01-11T22:54:21.1760348Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1760477Z traceback.print_stack() 2023-01-11T22:54:21.1760606Z File "", line 1, in 2023-01-11T22:54:21.1760815Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1760941Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1761143Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1761294Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1761510Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1761613Z self.run() 2023-01-11T22:54:21.1761816Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1761967Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1762308Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1762430Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1762794Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1762921Z getattr(self, test_name)() 2023-01-11T22:54:21.1763284Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1763383Z fn() 2023-01-11T22:54:21.1763746Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1763874Z test(self, **param_kwargs) 2023-01-11T22:54:21.1764229Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1764337Z return func(*args, **kwargs) 2023-01-11T22:54:21.1764590Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1764707Z self.run_subtests( 2023-01-11T22:54:21.1765060Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1765223Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1765588Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1765742Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1766114Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1766281Z output = model(*input) 2023-01-11T22:54:21.1766613Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1766754Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1767180Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1767362Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1767737Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1767861Z _lazy_init(state, module) 2023-01-11T22:54:21.1768213Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1768364Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1768768Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1768910Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1769249Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1769375Z return func(*args, **kwargs) 2023-01-11T22:54:21.1769758Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1769862Z p_assert( 2023-01-11T22:54:21.1770199Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1770310Z traceback.print_stack() 2023-01-11T22:54:21.1770546Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1770783Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1770920Z File "", line 1, in 2023-01-11T22:54:21.1771132Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1771278Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1771482Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1771618Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1771832Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1771939Z self.run() 2023-01-11T22:54:21.1772141Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1772289Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1772631Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1772770Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1773300Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1773411Z getattr(self, test_name)() 2023-01-11T22:54:21.1773777Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1773878Z fn() 2023-01-11T22:54:21.1774250Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1774375Z test(self, **param_kwargs) 2023-01-11T22:54:21.1774734Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1774860Z return func(*args, **kwargs) 2023-01-11T22:54:21.1775114Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1775211Z self.run_subtests( 2023-01-11T22:54:21.1775660Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1775822Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1776190Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1776344Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1776778Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1776908Z output = model(*input) 2023-01-11T22:54:21.1777244Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1777367Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1777745Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1777926Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1778296Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1778419Z _lazy_init(state, module) 2023-01-11T22:54:21.1778773Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1778948Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1779344Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1779471Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1779812Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1779969Z return func(*args, **kwargs) 2023-01-11T22:54:21.1780349Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1780500Z p_assert( 2023-01-11T22:54:21.1780836Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1780965Z traceback.print_stack() 2023-01-11T22:54:21.1781096Z File "", line 1, in 2023-01-11T22:54:21.1781292Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1781438Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1781641Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1781795Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1782010Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1782119Z self.run() 2023-01-11T22:54:21.1782325Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1782476Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1782802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1782936Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1783299Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1783427Z getattr(self, test_name)() 2023-01-11T22:54:21.1783787Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1783888Z fn() 2023-01-11T22:54:21.1784255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1784361Z test(self, **param_kwargs) 2023-01-11T22:54:21.1784718Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1784912Z return func(*args, **kwargs) 2023-01-11T22:54:21.1785167Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1785282Z self.run_subtests( 2023-01-11T22:54:21.1785641Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1785917Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1786291Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1786427Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1786799Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1786925Z output = model(*input) 2023-01-11T22:54:21.1787249Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1787393Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1787766Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1787943Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1788314Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1788441Z _lazy_init(state, module) 2023-01-11T22:54:21.1788778Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1788949Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1789349Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1789495Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1789839Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1789968Z return func(*args, **kwargs) 2023-01-11T22:54:21.1790346Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1790450Z p_assert( 2023-01-11T22:54:21.1790770Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1790901Z traceback.print_stack() 2023-01-11T22:54:21.1791139Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1791374Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1791506Z File "", line 1, in 2023-01-11T22:54:21.1791721Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1791872Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1792056Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1792209Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1792440Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1792544Z self.run() 2023-01-11T22:54:21.1792749Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1792898Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1793245Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1793381Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1793728Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1793855Z getattr(self, test_name)() 2023-01-11T22:54:21.1794283Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1794387Z fn() 2023-01-11T22:54:21.1794752Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1794878Z test(self, **param_kwargs) 2023-01-11T22:54:21.1795277Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1795412Z return func(*args, **kwargs) 2023-01-11T22:54:21.1795649Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1795765Z self.run_subtests( 2023-01-11T22:54:21.1796127Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1796291Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1796658Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1796809Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1797183Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1797305Z output = model(*input) 2023-01-11T22:54:21.1797616Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1797755Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1798135Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1798312Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1798682Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1798808Z _lazy_init(state, module) 2023-01-11T22:54:21.1799160Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1799328Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1799710Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1799857Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1800196Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1800322Z return func(*args, **kwargs) 2023-01-11T22:54:21.1800703Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1800808Z p_assert( 2023-01-11T22:54:21.1801144Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1801277Z traceback.print_stack() 2023-01-11T22:54:21.1801389Z File "", line 1, in 2023-01-11T22:54:21.1801601Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1801746Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1801951Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1802106Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1802320Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1802423Z self.run() 2023-01-11T22:54:21.1802606Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1802754Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1803097Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1803295Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1803664Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1803788Z getattr(self, test_name)() 2023-01-11T22:54:21.1804147Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1804248Z fn() 2023-01-11T22:54:21.1804641Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1804774Z test(self, **param_kwargs) 2023-01-11T22:54:21.1805137Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1805264Z return func(*args, **kwargs) 2023-01-11T22:54:21.1805520Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1805639Z self.run_subtests( 2023-01-11T22:54:21.1805990Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1806154Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1806499Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1806654Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1807031Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1807155Z output = model(*input) 2023-01-11T22:54:21.1807479Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1807619Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1807995Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1808173Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1808520Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1808642Z _lazy_init(state, module) 2023-01-11T22:54:21.1808993Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1809164Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1809563Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1809709Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1810048Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1810202Z return func(*args, **kwargs) 2023-01-11T22:54:21.1810584Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1810670Z p_assert( 2023-01-11T22:54:21.1811008Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1811137Z traceback.print_stack() 2023-01-11T22:54:21.1811376Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1811618Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1812372Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1813262Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1814173Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1814918Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1815750Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1816482Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1817222Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1817949Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1818144Z File "", line 1, in 2023-01-11T22:54:21.1818360Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1818490Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1818696Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1818848Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1819064Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1819170Z self.run() 2023-01-11T22:54:21.1819375Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1819520Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1819853Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1819987Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1820354Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1820480Z getattr(self, test_name)() 2023-01-11T22:54:21.1820844Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1820943Z fn() 2023-01-11T22:54:21.1821313Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1821438Z test(self, **param_kwargs) 2023-01-11T22:54:21.1821776Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1821904Z return func(*args, **kwargs) 2023-01-11T22:54:21.1822225Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1822339Z self.run_subtests( 2023-01-11T22:54:21.1822701Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1822868Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1823290Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1823453Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1823816Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1823937Z output = model(*input) 2023-01-11T22:54:21.1824268Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1824412Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1824790Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1824966Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1825331Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1825455Z _lazy_init(state, module) 2023-01-11T22:54:21.1825793Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1825966Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1826368Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1826510Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1826851Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1826983Z return func(*args, **kwargs) 2023-01-11T22:54:21.1827362Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1827466Z p_assert( 2023-01-11T22:54:21.1827785Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1827914Z traceback.print_stack() 2023-01-11T22:54:21.1828044Z File "", line 1, in 2023-01-11T22:54:21.1828253Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1828395Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1828600Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1828753Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1828969Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1829060Z self.run() 2023-01-11T22:54:21.1829260Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1829406Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1829755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1829894Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1830260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1830388Z getattr(self, test_name)() 2023-01-11T22:54:21.1830750Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1830832Z fn() 2023-01-11T22:54:21.1831196Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1831386Z test(self, **param_kwargs) 2023-01-11T22:54:21.1831751Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1831876Z return func(*args, **kwargs) 2023-01-11T22:54:21.1832132Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1832248Z self.run_subtests( 2023-01-11T22:54:21.1832632Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1832806Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1833182Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1833337Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1833712Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1833840Z output = model(*input) 2023-01-11T22:54:21.1834166Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1834305Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1834681Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1834842Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1835209Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1835333Z _lazy_init(state, module) 2023-01-11T22:54:21.1835691Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1835863Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1836266Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1836410Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1836753Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1836862Z return func(*args, **kwargs) 2023-01-11T22:54:21.1837249Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1837358Z p_assert( 2023-01-11T22:54:21.1837696Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1837826Z traceback.print_stack() 2023-01-11T22:54:21.1838065Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1838302Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1838439Z File "", line 1, in 2023-01-11T22:54:21.1838634Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1838780Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1838980Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1839131Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1839344Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1839451Z self.run() 2023-01-11T22:54:21.1839655Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1839787Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1840134Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1840268Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1840703Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1840829Z getattr(self, test_name)() 2023-01-11T22:54:21.1841190Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1841290Z fn() 2023-01-11T22:54:21.1841701Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1841814Z test(self, **param_kwargs) 2023-01-11T22:54:21.1842181Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1842308Z return func(*args, **kwargs) 2023-01-11T22:54:21.1842561Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1842676Z self.run_subtests( 2023-01-11T22:54:21.1843030Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1843199Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1843564Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1843699Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1844075Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1844197Z output = model(*input) 2023-01-11T22:54:21.1844526Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1844666Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1845044Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1845220Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1845594Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1845699Z _lazy_init(state, module) 2023-01-11T22:54:21.1846054Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1846225Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1846630Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1846775Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1847113Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1847241Z return func(*args, **kwargs) 2023-01-11T22:54:21.1847621Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1847710Z p_assert( 2023-01-11T22:54:21.1848046Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1848171Z traceback.print_stack() 2023-01-11T22:54:21.1848303Z File "", line 1, in 2023-01-11T22:54:21.1848514Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1848661Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1848864Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1849017Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1849211Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1849318Z self.run() 2023-01-11T22:54:21.1849521Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1849733Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1850082Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1850218Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1850582Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1850707Z getattr(self, test_name)() 2023-01-11T22:54:21.1851096Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1851202Z fn() 2023-01-11T22:54:21.1851578Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1851703Z test(self, **param_kwargs) 2023-01-11T22:54:21.1852058Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1852189Z return func(*args, **kwargs) 2023-01-11T22:54:21.1852443Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1852540Z self.run_subtests( 2023-01-11T22:54:21.1853341Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1853530Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1853911Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1854064Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1854437Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1854560Z output = model(*input) 2023-01-11T22:54:21.1854886Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1855013Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1855390Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1855567Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1855939Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1856066Z _lazy_init(state, module) 2023-01-11T22:54:21.1856416Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1856589Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1856988Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1857130Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1857456Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1857585Z return func(*args, **kwargs) 2023-01-11T22:54:21.1857965Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1858069Z p_assert( 2023-01-11T22:54:21.1858428Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1858560Z traceback.print_stack() 2023-01-11T22:54:21.1858803Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1859040Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1859153Z File "", line 1, in 2023-01-11T22:54:21.1859364Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1859510Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1859820Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1859973Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1860187Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1860293Z self.run() 2023-01-11T22:54:21.1860479Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1860684Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1861045Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1861178Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1861543Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1861668Z getattr(self, test_name)() 2023-01-11T22:54:21.1862031Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1862135Z fn() 2023-01-11T22:54:21.1862484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1862610Z test(self, **param_kwargs) 2023-01-11T22:54:21.1862965Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1863096Z return func(*args, **kwargs) 2023-01-11T22:54:21.1863350Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1863466Z self.run_subtests( 2023-01-11T22:54:21.1863823Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1863986Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1864331Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1864490Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1864865Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1864986Z output = model(*input) 2023-01-11T22:54:21.1865318Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1865459Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1865834Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1866009Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1866358Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1866482Z _lazy_init(state, module) 2023-01-11T22:54:21.1866839Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1867008Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1867406Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1867551Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1867893Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1868021Z return func(*args, **kwargs) 2023-01-11T22:54:21.1868380Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1868487Z p_assert( 2023-01-11T22:54:21.1868825Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1869055Z traceback.print_stack() 2023-01-11T22:54:21.1869184Z File "", line 1, in 2023-01-11T22:54:21.1869396Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1869540Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1869742Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1869877Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1870134Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1870250Z self.run() 2023-01-11T22:54:21.1870456Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1870602Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1870949Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1871084Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1871433Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1871560Z getattr(self, test_name)() 2023-01-11T22:54:21.1871920Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1872018Z fn() 2023-01-11T22:54:21.1872386Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1872510Z test(self, **param_kwargs) 2023-01-11T22:54:21.1872865Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1872994Z return func(*args, **kwargs) 2023-01-11T22:54:21.1873228Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1873343Z self.run_subtests( 2023-01-11T22:54:21.1873701Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1873863Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1874223Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1874377Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1874754Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1874877Z output = model(*input) 2023-01-11T22:54:21.1875184Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1875322Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1875699Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1875881Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1876249Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1876370Z _lazy_init(state, module) 2023-01-11T22:54:21.1876724Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1876896Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1877295Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1877422Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1877758Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1877886Z return func(*args, **kwargs) 2023-01-11T22:54:21.1878264Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1878431Z p_assert( 2023-01-11T22:54:21.1878775Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1878904Z traceback.print_stack() 2023-01-11T22:54:21.1879124Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1879424Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1880195Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1880930Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1881069Z File "", line 1, in 2023-01-11T22:54:21.1881283Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1881426Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1881630Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1881786Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1882001Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1882089Z self.run() 2023-01-11T22:54:21.1882295Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1882440Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1882783Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1882922Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1883287Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1883413Z getattr(self, test_name)() 2023-01-11T22:54:21.1883774Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1883857Z fn() 2023-01-11T22:54:21.1884219Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1884342Z test(self, **param_kwargs) 2023-01-11T22:54:21.1884696Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1884824Z return func(*args, **kwargs) 2023-01-11T22:54:21.1885080Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1885203Z self.run_subtests( 2023-01-11T22:54:21.1885535Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1885699Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1886064Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1886218Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1886594Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1886715Z output = model(*input) 2023-01-11T22:54:21.1887043Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1887184Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1887632Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1887789Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1888156Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1888282Z _lazy_init(state, module) 2023-01-11T22:54:21.1888685Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1888861Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1889267Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1889412Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1889749Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1889865Z return func(*args, **kwargs) 2023-01-11T22:54:21.1890245Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1890348Z p_assert( 2023-01-11T22:54:21.1890685Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1890812Z traceback.print_stack() 2023-01-11T22:54:21.1890947Z File "", line 1, in 2023-01-11T22:54:21.1891160Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.1891304Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.1891490Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.1891640Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.1891850Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.1891958Z self.run() 2023-01-11T22:54:21.1892163Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.1892311Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.1892655Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.1892769Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.1893310Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.1893436Z getattr(self, test_name)() 2023-01-11T22:54:21.1893801Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.1893907Z fn() 2023-01-11T22:54:21.1894269Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.1894397Z test(self, **param_kwargs) 2023-01-11T22:54:21.1894754Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.1894863Z return func(*args, **kwargs) 2023-01-11T22:54:21.1895115Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 204, in test_delayed_reduce_scatter 2023-01-11T22:54:21.1895233Z self.run_subtests( 2023-01-11T22:54:21.1895591Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.1895755Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.1896117Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.1896270Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.1896648Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.1896845Z output = model(*input) 2023-01-11T22:54:21.1897180Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.1897320Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.1897697Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.1897934Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.1898318Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.1898442Z _lazy_init(state, module) 2023-01-11T22:54:21.1898795Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.1898946Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.1899342Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.1899495Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.1899836Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.1899962Z return func(*args, **kwargs) 2023-01-11T22:54:21.1900342Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.1900448Z p_assert( 2023-01-11T22:54:21.1900785Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.1900895Z traceback.print_stack() 2023-01-11T22:54:21.1901132Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1901370Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1901606Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1901845Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1902072Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1902297Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1902531Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1902742Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1902971Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1903198Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1903423Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1903654Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1903885Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1904115Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1904341Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1904567Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1904774Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1905002Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1905228Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1905452Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1906629Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.1906790Z world_indices[ 2023-01-11T22:54:21.1907834Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.1907947Z world_indices[ 2023-01-11T22:54:21.1908183Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1908411Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1908641Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1908855Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1909083Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1909310Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1909539Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1909763Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1909992Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1910218Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1910441Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1910649Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1910884Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1911112Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1911334Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1911557Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1911784Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1912014Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1912244Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1912449Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1912673Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1912902Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1913127Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1913353Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1914116Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1914921Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1915707Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1916448Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1917180Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1917903Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1918641Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1919368Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1920105Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1920830Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1921568Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1922291Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1923024Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1923810Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1924588Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1925321Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1926055Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1926781Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1927510Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1928233Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1928962Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1929683Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1930412Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1931135Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1931865Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1932649Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1933607Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1934355Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1935085Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1935806Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1936535Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1937260Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1937994Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1938717Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1939446Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1940168Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1940893Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1941692Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1942478Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1943217Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1943951Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1944674Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1945399Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1946119Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1946840Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1947540Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1948268Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1948987Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1949716Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1950504Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1951275Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1952008Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1952736Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1953456Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1954178Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1954900Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1955630Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1956350Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1957082Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1957802Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1958561Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1959375Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1959710Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1959951Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1960182Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1960417Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.1960532Z dist init r=1, world=2 2023-01-11T22:54:21.1960864Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1961186Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1961496Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1961813Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1962125Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1962452Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1962769Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1963078Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1963405Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1963717Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1964026Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1964344Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.1964462Z dist init r=0, world=2 2023-01-11T22:54:21.1964771Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1965062Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1965374Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1965680Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1965982Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1966361Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1966708Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1967018Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1967336Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1967646Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1967956Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1968270Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.1968355Z ok (5.415s) 2023-01-11T22:54:21.1968569Z test_delayed_reduce_scatter_offload_true_none (__main__.TestParityWithDDP) 2023-01-11T22:54:21.1969497Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82399 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T22:54:21.1969724Z test_delayed_reduce_scatter_offload_true_shard_grad_op (__main__.TestParityWithDDP) 2023-01-11T22:54:21.1970615Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82403 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2023-01-11T22:54:21.1970943Z test_mixture_of_experts_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83931 2023-01-11T22:54:21.1971163Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83932 2023-01-11T22:54:21.1971536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.1971713Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.1972102Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.1972295Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.1972658Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.1972818Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.1973361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.1973553Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.1973800Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.1974044Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.1974539Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.1974937Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.1975166Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.1975441Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.1976479Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.1976601Z warnings.warn( 2023-01-11T22:54:21.1976843Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:54:21.1977863Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.1977976Z warnings.warn( 2023-01-11T22:54:21.1978219Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:54:21.1978618Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.1979371Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1979767Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.1980515Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1980758Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:54:21.1981001Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:54:21.1981397Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.1981771Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.1982012Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:54:21.1982259Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:54:21.1982657Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.1983048Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.1983289Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:54:21.1983589Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:54:21.1983984Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.1984372Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.1984638Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:54:21.1985035Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.1985268Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:54:21.1985662Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.1985904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:54:21.1986292Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.1986527Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:54:21.1986916Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.1987156Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:54:21.1987520Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.1987755Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:54:21.1988150Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.1988393Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:54:21.1988781Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.1989021Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:54:21.1989410Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.1990159Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1990894Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1991637Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1992361Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1993168Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1993934Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1994822Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1995600Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1996378Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1997140Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1997925Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1998690Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.1999412Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2000352Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2001142Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2001900Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2002753Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2003562Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2004341Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2005105Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2005892Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2006699Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2007504Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2008269Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2009040Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2009795Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2010568Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2011339Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2012106Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2013238Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2014048Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2014815Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2015584Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2016353Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2017123Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2017891Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2018690Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2019500Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2020282Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2021044Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2021337Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:54:21.2021696Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:54:21.2022147Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.2022585Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.2022911Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:54:21.2023137Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:54:21.2023631Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.2024071Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.2024355Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:54:21.2024782Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.2025059Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:54:21.2025493Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.2025809Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:54:21.2026239Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.2026560Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:54:21.2026950Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.2027228Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:54:21.2027495Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:54:21.2027927Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.2028354Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.2028629Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:54:21.2028898Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:54:21.2029339Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.2029806Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.2030034Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:54:21.2030310Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:54:21.2030741Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.2031168Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.2031444Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:54:21.2031750Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:54:21.2032263Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.2032695Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.2033057Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:54:21.2033286Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:54:21.2033724Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.2034154Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.2034430Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:54:21.2034716Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:54:21.2035146Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.2035579Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.2035857Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:54:21.2036173Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:54:21.2036611Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.2036984Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.2037264Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:54:21.2037547Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:54:21.2038016Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.2038453Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.2039262Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2040037Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2040853Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2041619Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2042403Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2043288Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2044123Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2044890Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2045670Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2046433Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2047249Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2048024Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2048833Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2049597Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2050376Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2051136Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2051907Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2052790Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2053798Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2054569Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2055349Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2056107Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2056873Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2057653Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2058421Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2059176Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2060008Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2060802Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2061571Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2062499Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2063288Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2064047Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2064826Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2065585Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2066390Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2067159Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2067939Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2068705Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2069482Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2070242Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2071008Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2071912Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2072738Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2073503Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2074224Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2074985Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2075746Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2076511Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2077278Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2078035Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2078805Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2079594Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2080359Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2081226Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2082007Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2082769Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2083588Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2084343Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2085111Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2085917Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2086684Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2087440Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2088212Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2088983Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2089745Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2090087Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:54:21.2090364Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:54:21.2090898Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.2091352Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.2092133Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2092454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:54:21.2093079Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.2093368Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:54:21.2093751Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.2094037Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:54:21.2094313Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:54:21.2094792Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.2095236Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.2095511Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:54:21.2095781Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:54:21.2096215Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.2096650Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.2096807Z dist init r=0, world=2 2023-01-11T22:54:21.2096902Z dist init r=1, world=2 2023-01-11T22:54:21.2097041Z ok (6.217s) 2023-01-11T22:54:21.2097449Z test_mixture_of_experts_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84254 2023-01-11T22:54:21.2097718Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84255 2023-01-11T22:54:21.2098130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.2098377Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.2098801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.2099030Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.2099383Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.2099595Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.2100017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.2100373Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.2100659Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.2100938Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.2101446Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.2102015Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.2102230Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.2102510Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.2103571Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.2103730Z warnings.warn( 2023-01-11T22:54:21.2104052Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:54:21.2105142Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.2105295Z warnings.warn( 2023-01-11T22:54:21.2105573Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:54:21.2106018Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.2106449Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.2107015Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:54:21.2107167Z warnings.warn( 2023-01-11T22:54:21.2107679Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:54:21.2107869Z warnings.warn( 2023-01-11T22:54:21.2108655Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2109428Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2109717Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:54:21.2109992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:54:21.2110428Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.2110934Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.2111210Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:54:21.2111553Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:54:21.2112051Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.2112440Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.2112725Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:54:21.2112998Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:54:21.2113432Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.2113861Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.2114139Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:54:21.2114415Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:54:21.2114891Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.2115328Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.2115550Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:54:21.2115826Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:54:21.2116253Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.2116680Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.2116958Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:54:21.2117229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:54:21.2117703Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.2118162Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.2118445Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:54:21.2118716Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:54:21.2119091Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.2119518Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.2120303Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2121069Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2121955Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2122726Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2123541Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2124323Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2125101Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2125861Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2126655Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2127420Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2128226Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2128990Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2129795Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2130558Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2131441Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2132221Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2133251Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2134036Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2134801Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2135552Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2136364Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2137135Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2137898Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2138664Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2138987Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:54:21.2139269Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:54:21.2139707Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.2140141Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.2140556Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:54:21.2140782Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:54:21.2141229Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.2141722Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.2142010Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:54:21.2142280Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:54:21.2142722Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.2143168Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.2143451Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:54:21.2143762Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:54:21.2144150Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.2144586Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.2144863Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:54:21.2145212Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:54:21.2145646Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.2146081Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.2146361Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:54:21.2146635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:54:21.2147095Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.2147523Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.2147746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:54:21.2148019Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:54:21.2148450Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.2148887Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.2149164Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:54:21.2149440Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:54:21.2149870Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.2150331Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.2150607Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:54:21.2150884Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:54:21.2151365Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.2151845Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.2152132Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:54:21.2152406Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:54:21.2152840Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.2153267Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.2153585Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:54:21.2153869Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:54:21.2154248Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.2154676Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.2154954Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:54:21.2155224Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:54:21.2155654Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.2156084Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.2156873Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2157676Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2158489Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2159267Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2160077Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2160846Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2161736Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2162527Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2163299Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2164101Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2164879Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2165644Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2166419Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2167183Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2167961Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2168726Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2169523Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2170318Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2171161Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2171968Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2172761Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2173707Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2174481Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2175245Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2176012Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2176820Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2177595Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2178370Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2179143Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2179901Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2180783Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2181598Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2182387Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2183202Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2183978Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2184733Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2185511Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2186273Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2187036Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2187800Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2188580Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2188810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:54:21.2189122Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:54:21.2189635Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.2190070Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.2190919Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2191210Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:54:21.2191498Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:54:21.2191942Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.2192378Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.2192701Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:54:21.2192923Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:54:21.2193361Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.2193788Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.2194068Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:54:21.2194350Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:54:21.2194792Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.2195244Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.2195397Z dist init r=0, world=2 2023-01-11T22:54:21.2195578Z dist init r=1, world=2 2023-01-11T22:54:21.2195665Z ok (6.317s) 2023-01-11T22:54:21.2196049Z test_mixture_of_experts_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84577 2023-01-11T22:54:21.2196307Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84578 2023-01-11T22:54:21.2196776Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.2196990Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.2197418Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.2197644Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.2198049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.2198208Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.2198667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.2198895Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.2199185Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.2199460Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.2199970Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.2200407Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.2200673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.2200985Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.2202107Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.2202304Z warnings.warn( 2023-01-11T22:54:21.2203299Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.2203450Z warnings.warn( 2023-01-11T22:54:21.2262799Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:54:21.2263135Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:54:21.2263602Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.2264146Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:54:21.2264254Z warnings.warn( 2023-01-11T22:54:21.2265006Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2265402Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.2265931Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:54:21.2266033Z warnings.warn( 2023-01-11T22:54:21.2266776Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2267013Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:54:21.2267248Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:54:21.2267638Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.2268024Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.2268255Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:54:21.2268695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:54:21.2269083Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.2269466Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.2269775Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:54:21.2270013Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:54:21.2270398Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.2270774Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.2271008Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:54:21.2271235Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:54:21.2271613Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.2271985Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.2272216Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:54:21.2272441Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:54:21.2272817Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.2273193Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.2273424Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:54:21.2273650Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:54:21.2274025Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.2274403Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.2274629Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:54:21.2274849Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:54:21.2275224Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.2275602Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.2276345Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2277068Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2277796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2278646Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2279384Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2280102Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2280832Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2281552Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2282274Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2282997Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2283715Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2284430Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2285150Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2285867Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2286585Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2287368Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2288130Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2288854Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2289577Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2290293Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2291010Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2291726Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2292446Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2293370Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2293619Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:54:21.2293847Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:54:21.2294247Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.2294633Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.2294858Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:54:21.2295242Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.2295565Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:54:21.2295959Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.2296187Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:54:21.2296465Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:54:21.2296864Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.2297245Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.2297477Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:54:21.2297694Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:54:21.2298082Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.2298460Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.2298689Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:54:21.2298914Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:54:21.2299296Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.2299671Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.2299896Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:54:21.2300120Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:54:21.2300495Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.2300865Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.2301095Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:54:21.2301317Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:54:21.2301696Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.2302073Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.2302303Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:54:21.2302525Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:54:21.2302905Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.2303284Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.2303506Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:54:21.2303727Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:54:21.2304105Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.2304572Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.2304798Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:54:21.2305019Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:54:21.2305444Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.2305833Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.2306058Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:54:21.2306272Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:54:21.2306650Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.2307031Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.2307255Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:54:21.2307478Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:54:21.2307857Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.2308233Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.2308971Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2309704Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2310436Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2311164Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2311890Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2312609Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2313328Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2314117Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2314881Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2315612Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2316334Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2317052Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2317770Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2318490Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2319213Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2319924Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2320645Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2321361Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2322078Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2322857Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2323618Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2324345Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2325064Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2325782Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2326497Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2327213Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2327930Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2328641Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2329361Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2330078Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2330796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2331567Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2332328Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2333183Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2333916Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2334634Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2335349Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2336067Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2336786Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2337501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2338220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2338465Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:54:21.2338857Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.2339091Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:54:21.2339480Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.2340298Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2340528Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:54:21.2340805Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:54:21.2341207Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.2341594Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.2341817Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:54:21.2342048Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:54:21.2342439Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.2342822Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.2343053Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:54:21.2343275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:54:21.2343655Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.2344033Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.2344136Z dist init r=1, world=2 2023-01-11T22:54:21.2344235Z dist init r=0, world=2 2023-01-11T22:54:21.2344327Z ok (6.217s) 2023-01-11T22:54:21.2344696Z test_mixture_of_experts_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84900 2023-01-11T22:54:21.2344909Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84901 2023-01-11T22:54:21.2345279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.2345446Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.2345816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.2345999Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.2346359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.2346525Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.2346895Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.2347077Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.2347311Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.2347545Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.2347933Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.2348320Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.2348541Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.2348818Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.2349888Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.2350004Z warnings.warn( 2023-01-11T22:54:21.2351016Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.2351123Z warnings.warn( 2023-01-11T22:54:21.2351357Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:54:21.2351592Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:54:21.2351985Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.2352728Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2352851Z File "", line 1, in 2023-01-11T22:54:21.2353060Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2353194Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2353384Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2353526Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2353730Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2353828Z self.run() 2023-01-11T22:54:21.2354024Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2354163Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2354501Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2354625Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2354974Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2355093Z getattr(self, test_name)() 2023-01-11T22:54:21.2355449Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2355538Z fn() 2023-01-11T22:54:21.2355892Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2356006Z test(self, **param_kwargs) 2023-01-11T22:54:21.2356356Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2356473Z return func(*args, **kwargs) 2023-01-11T22:54:21.2356702Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2356806Z self.run_subtests( 2023-01-11T22:54:21.2357152Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2357382Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2357744Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2357888Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2358254Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2358412Z output = model(*input) 2023-01-11T22:54:21.2358738Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2358868Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2359241Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2359410Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2359774Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2359893Z _lazy_init(state, module) 2023-01-11T22:54:21.2360241Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2360398Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2360813Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2360950Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2361282Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2361398Z return func(*args, **kwargs) 2023-01-11T22:54:21.2361766Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2361859Z p_assert( 2023-01-11T22:54:21.2362192Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2362309Z traceback.print_stack() 2023-01-11T22:54:21.2362693Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.2363432Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2363556Z File "", line 1, in 2023-01-11T22:54:21.2363758Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2363891Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2364086Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2364231Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2364435Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2364524Z self.run() 2023-01-11T22:54:21.2364717Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2364852Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2365192Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2365323Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2365681Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2365797Z getattr(self, test_name)() 2023-01-11T22:54:21.2366148Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2366231Z fn() 2023-01-11T22:54:21.2366683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2366803Z test(self, **param_kwargs) 2023-01-11T22:54:21.2367156Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2367277Z return func(*args, **kwargs) 2023-01-11T22:54:21.2367569Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2367684Z self.run_subtests( 2023-01-11T22:54:21.2368039Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2368186Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2368541Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2368693Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2369060Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2369172Z output = model(*input) 2023-01-11T22:54:21.2369491Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2369623Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2369998Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2370160Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2370521Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2370633Z _lazy_init(state, module) 2023-01-11T22:54:21.2370979Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2371143Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2371535Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2371671Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2371995Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2372109Z return func(*args, **kwargs) 2023-01-11T22:54:21.2372479Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2372571Z p_assert( 2023-01-11T22:54:21.2373069Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2373194Z traceback.print_stack() 2023-01-11T22:54:21.2373436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:54:21.2373677Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:54:21.2374073Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.2374461Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.2375430Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:795: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.2375601Z return torch._VF.split_with_sizes(self, split_size, dim) 2023-01-11T22:54:21.2376661Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:795: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.2376899Z return torch._VF.split_with_sizes(self, split_size, dim) 2023-01-11T22:54:21.2377029Z File "", line 1, in 2023-01-11T22:54:21.2377233Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2377367Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2377564Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2377705Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2377917Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2378006Z self.run() 2023-01-11T22:54:21.2378206Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2378348Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2378696Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2378828Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2379183Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2379304Z getattr(self, test_name)() 2023-01-11T22:54:21.2379657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2379739Z fn() 2023-01-11T22:54:21.2380100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2380226Z test(self, **param_kwargs) 2023-01-11T22:54:21.2380577Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2380692Z return func(*args, **kwargs) 2023-01-11T22:54:21.2380930Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2381035Z self.run_subtests( 2023-01-11T22:54:21.2381382Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2381531Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2381885Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2382030Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2382395Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2382509Z output = model(*input) 2023-01-11T22:54:21.2382828Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2382960Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2383328Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2383491Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2383852Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2383965Z _lazy_init(state, module) 2023-01-11T22:54:21.2384311Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2384469Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2384930Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2385065Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2385398Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2385505Z return func(*args, **kwargs) 2023-01-11T22:54:21.2385972Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2386074Z p_assert( 2023-01-11T22:54:21.2386407Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2386526Z traceback.print_stack() 2023-01-11T22:54:21.2386646Z File "", line 1, in 2023-01-11T22:54:21.2386849Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2386989Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2387176Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2387317Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2387518Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2387614Z self.run() 2023-01-11T22:54:21.2387809Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2387951Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2388284Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2388401Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2388752Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2388867Z getattr(self, test_name)() 2023-01-11T22:54:21.2389220Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2389313Z fn() 2023-01-11T22:54:21.2389673Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2389789Z test(self, **param_kwargs) 2023-01-11T22:54:21.2390138Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2390250Z return func(*args, **kwargs) 2023-01-11T22:54:21.2390489Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2390592Z self.run_subtests( 2023-01-11T22:54:21.2390937Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2391092Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2391452Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2391598Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2391964Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2392068Z output = model(*input) 2023-01-11T22:54:21.2392387Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2392517Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2392883Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2393056Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2393417Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2393596Z _lazy_init(state, module) 2023-01-11T22:54:21.2393950Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2394102Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2394493Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2394683Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2395035Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2395156Z return func(*args, **kwargs) 2023-01-11T22:54:21.2395531Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2395628Z p_assert( 2023-01-11T22:54:21.2395957Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2396072Z traceback.print_stack() 2023-01-11T22:54:21.2396310Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:54:21.2396551Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:54:21.2396946Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.2397339Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.2398087Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2398826Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2398955Z File "", line 1, in 2023-01-11T22:54:21.2399160Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2399300Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2399490Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2399636Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2399841Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2399941Z self.run() 2023-01-11T22:54:21.2400139Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2400280Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2400619Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2400746Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2401097Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2401211Z getattr(self, test_name)() 2023-01-11T22:54:21.2401570Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2401662Z fn() 2023-01-11T22:54:21.2402023Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2402137Z test(self, **param_kwargs) 2023-01-11T22:54:21.2402484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2402663Z return func(*args, **kwargs) 2023-01-11T22:54:21.2402894Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2403000Z self.run_subtests( 2023-01-11T22:54:21.2403353Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2403506Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2403907Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2404056Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2404430Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2404541Z output = model(*input) 2023-01-11T22:54:21.2404850Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2404984Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2405351Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2405519Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2405878Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2405995Z _lazy_init(state, module) 2023-01-11T22:54:21.2406343Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2406506Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2406885Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2407022Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2407355Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2407476Z return func(*args, **kwargs) 2023-01-11T22:54:21.2407849Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2407945Z p_assert( 2023-01-11T22:54:21.2408276Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2408401Z traceback.print_stack() 2023-01-11T22:54:21.2408515Z File "", line 1, in 2023-01-11T22:54:21.2408718Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2408856Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2409055Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2409201Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2409411Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2409510Z self.run() 2023-01-11T22:54:21.2409697Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2409837Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2410176Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2410305Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2410660Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2410775Z getattr(self, test_name)() 2023-01-11T22:54:21.2411129Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2411222Z fn() 2023-01-11T22:54:21.2411569Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2411752Z test(self, **param_kwargs) 2023-01-11T22:54:21.2412113Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2412229Z return func(*args, **kwargs) 2023-01-11T22:54:21.2412471Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2412578Z self.run_subtests( 2023-01-11T22:54:21.2413241Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2413415Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2413769Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2413916Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2414283Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2414400Z output = model(*input) 2023-01-11T22:54:21.2414715Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2414851Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2415221Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2415391Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2415741Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2415855Z _lazy_init(state, module) 2023-01-11T22:54:21.2416201Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2416364Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2416757Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2416894Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2417223Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2417342Z return func(*args, **kwargs) 2023-01-11T22:54:21.2417704Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2417806Z p_assert( 2023-01-11T22:54:21.2418141Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2418264Z traceback.print_stack() 2023-01-11T22:54:21.2418503Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:54:21.2418745Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:54:21.2419143Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.2419532Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.2420275Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2421005Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2421214Z File "", line 1, in 2023-01-11T22:54:21.2421411Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2421547Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2421744Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2421885Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2422134Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2422238Z self.run() 2023-01-11T22:54:21.2422436Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2422568Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2422910Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2423036Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2423403Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2423521Z getattr(self, test_name)() 2023-01-11T22:54:21.2423870Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2423965Z fn() 2023-01-11T22:54:21.2424330Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2424438Z test(self, **param_kwargs) 2023-01-11T22:54:21.2424787Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2424905Z return func(*args, **kwargs) 2023-01-11T22:54:21.2425143Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2425247Z self.run_subtests( 2023-01-11T22:54:21.2425591Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2425756Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2426111Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2426248Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2426622Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2426737Z output = model(*input) 2023-01-11T22:54:21.2427056Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2427191Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2427564Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2427738Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2428105Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2428209Z _lazy_init(state, module) 2023-01-11T22:54:21.2428553Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2428713Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2429108Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2429244Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2429572Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2429693Z return func(*args, **kwargs) 2023-01-11T22:54:21.2430062Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2430234Z p_assert( 2023-01-11T22:54:21.2430573Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2430693Z traceback.print_stack() 2023-01-11T22:54:21.2430824Z File "", line 1, in 2023-01-11T22:54:21.2431032Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2431219Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2431425Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2431577Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2431771Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2431876Z self.run() 2023-01-11T22:54:21.2432081Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2432235Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2432580Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2432711Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2433066Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2433174Z getattr(self, test_name)() 2023-01-11T22:54:21.2433532Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2433623Z fn() 2023-01-11T22:54:21.2433986Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2434107Z test(self, **param_kwargs) 2023-01-11T22:54:21.2434465Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2434594Z return func(*args, **kwargs) 2023-01-11T22:54:21.2434843Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2434939Z self.run_subtests( 2023-01-11T22:54:21.2435295Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2435461Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2435828Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2435985Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2436360Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2436482Z output = model(*input) 2023-01-11T22:54:21.2436807Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2436935Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2437315Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2437490Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2437860Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2437987Z _lazy_init(state, module) 2023-01-11T22:54:21.2438348Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2438518Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2438918Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2439062Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2439458Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2439588Z return func(*args, **kwargs) 2023-01-11T22:54:21.2439969Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2440075Z p_assert( 2023-01-11T22:54:21.2440411Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2440588Z traceback.print_stack() 2023-01-11T22:54:21.2440842Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:54:21.2441088Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:54:21.2441480Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.2441877Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.2442631Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2443379Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2443516Z File "", line 1, in 2023-01-11T22:54:21.2443730Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2443876Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2444085Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2444239Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2444456Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2444544Z self.run() 2023-01-11T22:54:21.2444750Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2444902Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2445248Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2445385Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2445750Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2445875Z getattr(self, test_name)() 2023-01-11T22:54:21.2446218Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2446324Z fn() 2023-01-11T22:54:21.2446696Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2446824Z test(self, **param_kwargs) 2023-01-11T22:54:21.2446954Z File "", line 1, in 2023-01-11T22:54:21.2447317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2447446Z return func(*args, **kwargs) 2023-01-11T22:54:21.2447697Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2447795Z self.run_subtests( 2023-01-11T22:54:21.2448008Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2448153Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2448511Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2448739Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2448946Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2449099Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2449523Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2449666Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2449884Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2449991Z self.run() 2023-01-11T22:54:21.2450375Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2450497Z output = model(*input) 2023-01-11T22:54:21.2450703Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2450853Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2451162Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2451302Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2451641Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2451781Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2452160Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2452336Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2452698Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2452824Z getattr(self, test_name)() 2023-01-11T22:54:21.2453477Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2453589Z _lazy_init(state, module) 2023-01-11T22:54:21.2453951Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2454051Z fn() 2023-01-11T22:54:21.2454404Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2454577Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2454941Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2455066Z test(self, **param_kwargs) 2023-01-11T22:54:21.2455466Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2455593Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2455965Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2456093Z return func(*args, **kwargs) 2023-01-11T22:54:21.2456431Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2456558Z return func(*args, **kwargs) 2023-01-11T22:54:21.2456809Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2456926Z self.run_subtests( 2023-01-11T22:54:21.2457307Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2457393Z p_assert( 2023-01-11T22:54:21.2457745Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2457910Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2458356Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2458485Z traceback.print_stack() 2023-01-11T22:54:21.2458850Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2459005Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2459444Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2459555Z output = model(*input) 2023-01-11T22:54:21.2459890Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2460035Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2460411Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2460587Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2460959Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2461083Z _lazy_init(state, module) 2023-01-11T22:54:21.2461465Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2461618Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2462023Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2462170Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2462510Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2462637Z return func(*args, **kwargs) 2023-01-11T22:54:21.2463014Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2463121Z p_assert( 2023-01-11T22:54:21.2463457Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2463567Z traceback.print_stack() 2023-01-11T22:54:21.2463816Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:54:21.2464066Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:54:21.2464471Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.2464868Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.2465622Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2466368Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2466502Z File "", line 1, in 2023-01-11T22:54:21.2466718Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2466862Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2467050Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2467203Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2467419Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2467591Z self.run() 2023-01-11T22:54:21.2467799Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2467949Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2468302Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2468418Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2468834Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2468965Z getattr(self, test_name)() 2023-01-11T22:54:21.2469334Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2469435Z fn() 2023-01-11T22:54:21.2469802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2469932Z test(self, **param_kwargs) 2023-01-11T22:54:21.2470291Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2470400Z return func(*args, **kwargs) 2023-01-11T22:54:21.2470648Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2470762Z self.run_subtests( 2023-01-11T22:54:21.2471121Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2471284Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2471650Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2471805Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2472183Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2472290Z output = model(*input) 2023-01-11T22:54:21.2472621Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2472764Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2473144Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2473323Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2473692Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2473816Z _lazy_init(state, module) 2023-01-11T22:54:21.2474171Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2474322Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2474722Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2474871Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2475213Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2475340Z return func(*args, **kwargs) 2023-01-11T22:54:21.2475721Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2475827Z p_assert( 2023-01-11T22:54:21.2476164Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2476274Z traceback.print_stack() 2023-01-11T22:54:21.2476405Z File "", line 1, in 2023-01-11T22:54:21.2476616Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2476764Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2477040Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2477195Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2477410Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2477514Z self.run() 2023-01-11T22:54:21.2477701Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2477895Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2478259Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2478394Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2478760Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2478885Z getattr(self, test_name)() 2023-01-11T22:54:21.2479247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2479352Z fn() 2023-01-11T22:54:21.2479700Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2479826Z test(self, **param_kwargs) 2023-01-11T22:54:21.2480185Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2480316Z return func(*args, **kwargs) 2023-01-11T22:54:21.2480565Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2480680Z self.run_subtests( 2023-01-11T22:54:21.2481039Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2481202Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2481546Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2481707Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2482084Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2482207Z output = model(*input) 2023-01-11T22:54:21.2482536Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2482679Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2483059Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2483235Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2483586Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2483709Z _lazy_init(state, module) 2023-01-11T22:54:21.2484071Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2484241Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2484642Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2484786Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2485127Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2485256Z return func(*args, **kwargs) 2023-01-11T22:54:21.2485616Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2485721Z p_assert( 2023-01-11T22:54:21.2486060Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2486256Z traceback.print_stack() 2023-01-11T22:54:21.2486504Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:54:21.2486751Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:54:21.2487160Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.2487612Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.2488378Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2489123Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2489242Z File "", line 1, in 2023-01-11T22:54:21.2489454Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2489601Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2489809Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2489961Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2490176Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2490282Z self.run() 2023-01-11T22:54:21.2490488Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2490619Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2490972Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2491109Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2491474Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2491599Z getattr(self, test_name)() 2023-01-11T22:54:21.2491965Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2492064Z fn() 2023-01-11T22:54:21.2492412Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2492537Z test(self, **param_kwargs) 2023-01-11T22:54:21.2493077Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2493214Z return func(*args, **kwargs) 2023-01-11T22:54:21.2493471Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2493587Z self.run_subtests( 2023-01-11T22:54:21.2493947Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2494111Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2494461Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2494617Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2494993Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2495115Z output = model(*input) 2023-01-11T22:54:21.2495447Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2495586Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2496079Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2496257Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2496625Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2496730Z _lazy_init(state, module) 2023-01-11T22:54:21.2497142Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2497324Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2497732Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2497877Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2498214Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2498344Z return func(*args, **kwargs) 2023-01-11T22:54:21.2498723Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2498809Z p_assert( 2023-01-11T22:54:21.2499148Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2499281Z traceback.print_stack() 2023-01-11T22:54:21.2499413Z File "", line 1, in 2023-01-11T22:54:21.2499628Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2499774Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2499980Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2500114Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2500330Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2500441Z self.run() 2023-01-11T22:54:21.2500647Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2500795Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2501138Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2501274Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2501641Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2501747Z getattr(self, test_name)() 2023-01-11T22:54:21.2502112Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2502214Z fn() 2023-01-11T22:54:21.2502582Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2502711Z test(self, **param_kwargs) 2023-01-11T22:54:21.2503071Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2503197Z return func(*args, **kwargs) 2023-01-11T22:54:21.2503447Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2503543Z self.run_subtests( 2023-01-11T22:54:21.2503903Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2504069Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2504431Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2504586Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2504963Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2505149Z output = model(*input) 2023-01-11T22:54:21.2505485Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2505607Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2505987Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2506213Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2506598Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2506721Z _lazy_init(state, module) 2023-01-11T22:54:21.2507077Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2507248Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2507648Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2507779Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2508121Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2508249Z return func(*args, **kwargs) 2023-01-11T22:54:21.2508635Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2508739Z p_assert( 2023-01-11T22:54:21.2509076Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2509205Z traceback.print_stack() 2023-01-11T22:54:21.2509452Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:54:21.2509680Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:54:21.2510091Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.2510490Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.2511246Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2511985Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2512124Z File "", line 1, in 2023-01-11T22:54:21.2512337Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2512482Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2512689Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2512841Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2513041Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2513147Z self.run() 2023-01-11T22:54:21.2513351Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2513500Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2513846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2513984Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2514349Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2514541Z getattr(self, test_name)() 2023-01-11T22:54:21.2514893Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2514994Z fn() 2023-01-11T22:54:21.2515362Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2515544Z test(self, **param_kwargs) 2023-01-11T22:54:21.2515921Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2516049Z return func(*args, **kwargs) 2023-01-11T22:54:21.2516298Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2516412Z self.run_subtests( 2023-01-11T22:54:21.2516751Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2516920Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2517285Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2517440Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2517817Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2517940Z output = model(*input) 2023-01-11T22:54:21.2518266Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2518404Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2518766Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2518944Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2519322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2519445Z _lazy_init(state, module) 2023-01-11T22:54:21.2519799Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2519971Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2520372Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2520518Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2520837Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2520963Z return func(*args, **kwargs) 2023-01-11T22:54:21.2521342Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2521450Z p_assert( 2023-01-11T22:54:21.2521790Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2521918Z traceback.print_stack() 2023-01-11T22:54:21.2522050Z File "", line 1, in 2023-01-11T22:54:21.2522262Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2522392Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2522595Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2522749Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2522963Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2523070Z self.run() 2023-01-11T22:54:21.2523274Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2523421Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2523819Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2523954Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2524318Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2524443Z getattr(self, test_name)() 2023-01-11T22:54:21.2524863Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2524971Z fn() 2023-01-11T22:54:21.2525344Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2525469Z test(self, **param_kwargs) 2023-01-11T22:54:21.2525809Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2525935Z return func(*args, **kwargs) 2023-01-11T22:54:21.2526190Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2526306Z self.run_subtests( 2023-01-11T22:54:21.2526662Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2526824Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2527190Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2527343Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2527701Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2527823Z output = model(*input) 2023-01-11T22:54:21.2528151Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2528292Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2528673Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2528850Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2529216Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2529339Z _lazy_init(state, module) 2023-01-11T22:54:21.2529680Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2529852Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2530252Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2530396Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2530738Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2530869Z return func(*args, **kwargs) 2023-01-11T22:54:21.2531248Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2531354Z p_assert( 2023-01-11T22:54:21.2531676Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2531808Z traceback.print_stack() 2023-01-11T22:54:21.2532056Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:54:21.2532296Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:54:21.2532697Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.2533266Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.2534172Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2534977Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2535739Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2536487Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2537225Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2537955Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2538693Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2539431Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2540165Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2540899Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2541033Z File "", line 1, in 2023-01-11T22:54:21.2541169Z File "", line 1, in 2023-01-11T22:54:21.2541369Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2541516Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2541724Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2541877Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2542090Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2542298Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2542512Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2542600Z self.run() 2023-01-11T22:54:21.2542803Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2542956Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2543204Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2543359Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2543575Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2543681Z self.run() 2023-01-11T22:54:21.2544037Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2544153Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2544356Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2544509Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2544874Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2545004Z getattr(self, test_name)() 2023-01-11T22:54:21.2545343Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2545482Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2545841Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2545923Z fn() 2023-01-11T22:54:21.2546285Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2546411Z getattr(self, test_name)() 2023-01-11T22:54:21.2546779Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2546907Z test(self, **param_kwargs) 2023-01-11T22:54:21.2547265Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2547361Z fn() 2023-01-11T22:54:21.2547702Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2547828Z return func(*args, **kwargs) 2023-01-11T22:54:21.2548193Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2548317Z test(self, **param_kwargs) 2023-01-11T22:54:21.2548565Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2548680Z self.run_subtests( 2023-01-11T22:54:21.2549037Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2549167Z return func(*args, **kwargs) 2023-01-11T22:54:21.2549505Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2549669Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2549912Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2550025Z self.run_subtests( 2023-01-11T22:54:21.2550394Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2550551Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2550903Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2551065Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2551422Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2551611Z output = model(*input) 2023-01-11T22:54:21.2551991Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2552147Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2552474Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2552663Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2553055Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2553175Z output = model(*input) 2023-01-11T22:54:21.2553533Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2553712Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2554045Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2554184Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2554551Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2554676Z _lazy_init(state, module) 2023-01-11T22:54:21.2555056Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2555233Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2555590Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2555742Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2556107Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2556236Z _lazy_init(state, module) 2023-01-11T22:54:21.2556636Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2556782Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2557136Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2557311Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2557653Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2557762Z return func(*args, **kwargs) 2023-01-11T22:54:21.2558166Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2558311Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2558690Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2558798Z p_assert( 2023-01-11T22:54:21.2559138Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2559266Z return func(*args, **kwargs) 2023-01-11T22:54:21.2559605Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2559719Z traceback.print_stack() 2023-01-11T22:54:21.2560100Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2560203Z p_assert( 2023-01-11T22:54:21.2560537Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2560667Z traceback.print_stack() 2023-01-11T22:54:21.2560916Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:54:21.2561232Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:54:21.2561642Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.2562045Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.2562852Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2563613Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2563750Z File "", line 1, in 2023-01-11T22:54:21.2563879Z File "", line 1, in 2023-01-11T22:54:21.2564097Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2564244Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2564453Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2564606Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2564817Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2564942Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2565156Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2565263Z self.run() 2023-01-11T22:54:21.2565467Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2565623Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2565826Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2565978Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2566174Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2566278Z self.run() 2023-01-11T22:54:21.2566633Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2566768Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2566972Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2567119Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2567487Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2567616Z getattr(self, test_name)() 2023-01-11T22:54:21.2567937Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2568071Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2568433Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2568533Z fn() 2023-01-11T22:54:21.2568900Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2569026Z getattr(self, test_name)() 2023-01-11T22:54:21.2569397Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2569521Z test(self, **param_kwargs) 2023-01-11T22:54:21.2569858Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2570023Z fn() 2023-01-11T22:54:21.2570396Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2570526Z return func(*args, **kwargs) 2023-01-11T22:54:21.2570888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2571014Z test(self, **param_kwargs) 2023-01-11T22:54:21.2571312Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2571416Z self.run_subtests( 2023-01-11T22:54:21.2571779Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2571907Z return func(*args, **kwargs) 2023-01-11T22:54:21.2572263Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2572427Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2572676Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2572794Z self.run_subtests( 2023-01-11T22:54:21.2573351Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2573490Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2573850Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2574016Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2574393Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2574514Z output = model(*input) 2023-01-11T22:54:21.2574881Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2575041Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2575369Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2575508Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2575866Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2575990Z output = model(*input) 2023-01-11T22:54:21.2576370Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2576548Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2576876Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2577016Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2577382Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2577514Z _lazy_init(state, module) 2023-01-11T22:54:21.2577872Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2578047Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2578409Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2578658Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2579026Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2579149Z _lazy_init(state, module) 2023-01-11T22:54:21.2579545Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2579780Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2580119Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2580286Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2580625Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2580752Z return func(*args, **kwargs) 2023-01-11T22:54:21.2581214Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2581368Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2581755Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2581859Z p_assert( 2023-01-11T22:54:21.2582179Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2582310Z return func(*args, **kwargs) 2023-01-11T22:54:21.2582648Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2582776Z traceback.print_stack() 2023-01-11T22:54:21.2583152Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2583256Z p_assert( 2023-01-11T22:54:21.2583592Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2583718Z traceback.print_stack() 2023-01-11T22:54:21.2583949Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:54:21.2584189Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:54:21.2584593Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.2584994Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.2585751Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2586497Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2586630Z File "", line 1, in 2023-01-11T22:54:21.2586843Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2586991Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2587196Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2587331Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2587545Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2587650Z self.run() 2023-01-11T22:54:21.2587862Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2588011Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2588354Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2588491Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2588853Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2589023Z getattr(self, test_name)() 2023-01-11T22:54:21.2589393Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2589493Z fn() 2023-01-11T22:54:21.2589862Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2589986Z test(self, **param_kwargs) 2023-01-11T22:54:21.2590392Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2590527Z return func(*args, **kwargs) 2023-01-11T22:54:21.2590759Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2590872Z self.run_subtests( 2023-01-11T22:54:21.2591003Z File "", line 1, in 2023-01-11T22:54:21.2591361Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2591528Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2591895Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2592049Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2592260Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2592388Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2592764Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2592886Z output = model(*input) 2023-01-11T22:54:21.2593092Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2593247Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2593575Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2593720Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2593933Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2594021Z self.run() 2023-01-11T22:54:21.2594400Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2594577Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2594789Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2594939Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2595308Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2595432Z _lazy_init(state, module) 2023-01-11T22:54:21.2595770Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2595890Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2596247Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2596418Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2596777Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2596901Z getattr(self, test_name)() 2023-01-11T22:54:21.2597303Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2597449Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2597813Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2597894Z fn() 2023-01-11T22:54:21.2598235Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2598426Z return func(*args, **kwargs) 2023-01-11T22:54:21.2598804Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2598931Z test(self, **param_kwargs) 2023-01-11T22:54:21.2599312Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2599417Z p_assert( 2023-01-11T22:54:21.2599824Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2599939Z return func(*args, **kwargs) 2023-01-11T22:54:21.2600285Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2600413Z traceback.print_stack() 2023-01-11T22:54:21.2600663Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2600785Z self.run_subtests( 2023-01-11T22:54:21.2601143Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2601308Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2601670Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2601811Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2602185Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2602308Z output = model(*input) 2023-01-11T22:54:21.2602637Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2602776Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2603156Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2603337Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2603705Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2603811Z _lazy_init(state, module) 2023-01-11T22:54:21.2604171Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2604341Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2604740Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2604885Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2605227Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2605353Z return func(*args, **kwargs) 2023-01-11T22:54:21.2605737Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2605823Z p_assert( 2023-01-11T22:54:21.2606163Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2606290Z traceback.print_stack() 2023-01-11T22:54:21.2606539Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:54:21.2606778Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:54:21.2607182Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.2607580Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.2608331Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2609201Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2609342Z File "", line 1, in 2023-01-11T22:54:21.2609538Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2609683Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2609891Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2610049Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2610265Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2610369Z self.run() 2023-01-11T22:54:21.2610573Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2610720Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2611057Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2611195Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2611568Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2611692Z getattr(self, test_name)() 2023-01-11T22:54:21.2612054Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2612154Z fn() 2023-01-11T22:54:21.2612522Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2612649Z test(self, **param_kwargs) 2023-01-11T22:54:21.2613169Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2613303Z return func(*args, **kwargs) 2023-01-11T22:54:21.2613559Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2613675Z self.run_subtests( 2023-01-11T22:54:21.2614037Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2614201Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2614563Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2614719Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2615083Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2615204Z output = model(*input) 2023-01-11T22:54:21.2615532Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2615672Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2616055Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2616233Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2616600Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2616724Z _lazy_init(state, module) 2023-01-11T22:54:21.2617060Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2617321Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2617729Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2617875Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2618214Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2618401Z return func(*args, **kwargs) 2023-01-11T22:54:21.2618799Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2618903Z p_assert( 2023-01-11T22:54:21.2619221Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2619350Z traceback.print_stack() 2023-01-11T22:54:21.2619480Z File "", line 1, in 2023-01-11T22:54:21.2619696Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2619845Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2620051Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2620204Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2620401Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2620507Z self.run() 2023-01-11T22:54:21.2620714Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2620862Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2621206Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2621340Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2621705Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2621834Z getattr(self, test_name)() 2023-01-11T22:54:21.2622178Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2622279Z fn() 2023-01-11T22:54:21.2622645Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2622773Z test(self, **param_kwargs) 2023-01-11T22:54:21.2623136Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2623263Z return func(*args, **kwargs) 2023-01-11T22:54:21.2623509Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2623623Z self.run_subtests( 2023-01-11T22:54:21.2623961Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2624123Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2624491Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2624645Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2625020Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2625141Z output = model(*input) 2023-01-11T22:54:21.2625472Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2625614Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2625974Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2626151Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2626518Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2626720Z _lazy_init(state, module) 2023-01-11T22:54:21.2627080Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2627250Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2627697Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2627847Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2628175Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2628303Z return func(*args, **kwargs) 2023-01-11T22:54:21.2628679Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2628784Z p_assert( 2023-01-11T22:54:21.2629122Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2629254Z traceback.print_stack() 2023-01-11T22:54:21.2629500Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:54:21.2629741Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:54:21.2630127Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.2630529Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.2631279Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2632025Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2632268Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:54:21.2632507Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:54:21.2632905Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.2633299Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.2633546Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:54:21.2633788Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:54:21.2634186Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.2634578Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.2634803Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:54:21.2635038Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:54:21.2635431Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.2635820Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.2636120Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:54:21.2636356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:54:21.2636758Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.2637195Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.2637440Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:54:21.2637654Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:54:21.2638054Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.2638451Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.2638689Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:54:21.2638920Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:54:21.2639321Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.2639708Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.2640458Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2640707Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:54:21.2640943Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:54:21.2641318Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.2641716Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.2642456Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2643197Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2643950Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2644696Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2645465Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2646313Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2647060Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2647796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2648530Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2649261Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2649993Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2650732Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2651457Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2652187Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2653044Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2653790Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2654519Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2655391Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2656135Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2656867Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2657600Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2658324Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2659051Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2659784Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2660506Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2661230Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2661959Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2662709Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2663436Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2664272Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2665008Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2665734Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2666464Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2667192Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2667916Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2668651Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2669375Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2670100Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2670353Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:54:21.2670593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:54:21.2671000Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.2671398Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.2672126Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2672922Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2673238Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:54:21.2673648Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.2673893Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:54:21.2674290Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.2675025Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2675266Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:54:21.2675660Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.2675904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:54:21.2676300Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.2677040Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2677288Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:54:21.2677502Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:54:21.2677903Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.2678300Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.2679044Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2679293Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 0 2023-01-11T22:54:21.2679528Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 1 2023-01-11T22:54:21.2679925Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:54:21.2680323Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:54:21.2681068Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2681313Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 0 2023-01-11T22:54:21.2681609Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 1 2023-01-11T22:54:21.2681990Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:54:21.2682438Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:54:21.2683194Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2683438Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 0 2023-01-11T22:54:21.2683678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 1 2023-01-11T22:54:21.2684072Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:54:21.2684463Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:54:21.2685206Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2685450Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 1 2023-01-11T22:54:21.2685685Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 0 2023-01-11T22:54:21.2686083Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:54:21.2686479Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:54:21.2687205Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2687450Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 0 2023-01-11T22:54:21.2687684Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 1 2023-01-11T22:54:21.2688081Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:54:21.2688480Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:54:21.2689225Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2689468Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 1 2023-01-11T22:54:21.2689708Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 0 2023-01-11T22:54:21.2690097Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:54:21.2690489Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:54:21.2691310Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2691599Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 0 2023-01-11T22:54:21.2691820Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 1 2023-01-11T22:54:21.2692218Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:54:21.2692614Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:54:21.2693478Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2693730Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 0 2023-01-11T22:54:21.2693969Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 1 2023-01-11T22:54:21.2694365Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:54:21.2694758Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:54:21.2695493Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2696237Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2696974Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2697708Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2698449Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2699185Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2699910Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2700735Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2701521Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2702267Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2702998Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2703730Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2704453Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2705187Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2705911Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2706637Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2707368Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2708098Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2708821Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2709617Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2710388Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2711124Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2711833Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2712561Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2713288Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2714019Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2714748Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2715473Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2716203Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2716932Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2717660Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2718456Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2719220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2719961Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2720690Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2721420Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2722142Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2722874Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2723600Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2724329Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2725054Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2725783Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2726508Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2727304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2728073Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2728808Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2729060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 0 2023-01-11T22:54:21.2729299Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 1 2023-01-11T22:54:21.2729706Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:54:21.2730102Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:54:21.2730833Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2731563Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2731804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 0 2023-01-11T22:54:21.2732039Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 1 2023-01-11T22:54:21.2732434Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:54:21.2732808Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:54:21.2733793Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2734541Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2734782Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 1 2023-01-11T22:54:21.2735016Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 0 2023-01-11T22:54:21.2735411Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:54:21.2735803Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:54:21.2736628Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2737414Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2737660Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 0 2023-01-11T22:54:21.2737896Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 1 2023-01-11T22:54:21.2738297Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:54:21.2738688Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:54:21.2739403Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2740132Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2740251Z dist init r=1, world=2 2023-01-11T22:54:21.2740583Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.2740908Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.2741221Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.2741531Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.2741837Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.2742145Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.2742447Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.2742751Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.2743053Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.2743354Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.2743637Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.2744006Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.2744119Z dist init r=0, world=2 2023-01-11T22:54:21.2744496Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.2744816Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.2745124Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.2745431Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.2745739Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.2746041Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.2746346Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.2746648Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.2746931Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.2747236Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.2747538Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.2747843Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.2747949Z ok (6.417s) 2023-01-11T22:54:21.2748276Z test_mixture_of_experts_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85247 2023-01-11T22:54:21.2748500Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85248 2023-01-11T22:54:21.2748893Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.2749075Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.2749459Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.2749633Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.2750004Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.2750182Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.2750566Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.2750759Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.2751006Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.2751329Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.2751738Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.2752132Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.2752390Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.2752628Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.2753669Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.2753791Z warnings.warn( 2023-01-11T22:54:21.2754814Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.2754929Z warnings.warn( 2023-01-11T22:54:21.2755176Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:54:21.2755420Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:54:21.2755819Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.2756357Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:54:21.2756471Z warnings.warn( 2023-01-11T22:54:21.2757218Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2757336Z File "", line 1, in 2023-01-11T22:54:21.2757551Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2757698Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2757904Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2758060Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2758279Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2758387Z self.run() 2023-01-11T22:54:21.2758574Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2758723Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2759073Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2759210Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2759577Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2759703Z getattr(self, test_name)() 2023-01-11T22:54:21.2760067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2760230Z fn() 2023-01-11T22:54:21.2760586Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2760712Z test(self, **param_kwargs) 2023-01-11T22:54:21.2761072Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2761200Z return func(*args, **kwargs) 2023-01-11T22:54:21.2761497Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2761619Z self.run_subtests( 2023-01-11T22:54:21.2761982Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2762149Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2762493Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2762651Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2763059Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2763183Z output = model(*input) 2023-01-11T22:54:21.2763511Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2763652Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2764036Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2764213Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2764564Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2764689Z _lazy_init(state, module) 2023-01-11T22:54:21.2765045Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2765219Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2765617Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2765765Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2766109Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2766237Z return func(*args, **kwargs) 2023-01-11T22:54:21.2766614Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2766700Z p_assert( 2023-01-11T22:54:21.2767042Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2767171Z traceback.print_stack() 2023-01-11T22:54:21.2767571Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.2768110Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:54:21.2768227Z warnings.warn( 2023-01-11T22:54:21.2768982Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2769117Z File "", line 1, in 2023-01-11T22:54:21.2769330Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2769457Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2769663Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2769881Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2770097Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2770203Z self.run() 2023-01-11T22:54:21.2770409Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2770561Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2770940Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2771085Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2771458Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2771583Z getattr(self, test_name)() 2023-01-11T22:54:21.2771946Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2772051Z fn() 2023-01-11T22:54:21.2772417Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2772543Z test(self, **param_kwargs) 2023-01-11T22:54:21.2773039Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2773174Z return func(*args, **kwargs) 2023-01-11T22:54:21.2773433Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2773548Z self.run_subtests( 2023-01-11T22:54:21.2773909Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2774077Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2774438Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2774596Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2774954Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2775076Z output = model(*input) 2023-01-11T22:54:21.2775403Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2775548Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2775929Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2776105Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2776473Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2776595Z _lazy_init(state, module) 2023-01-11T22:54:21.2776931Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2777106Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2777505Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2777651Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2777992Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2778118Z return func(*args, **kwargs) 2023-01-11T22:54:21.2778495Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2778602Z p_assert( 2023-01-11T22:54:21.2778921Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2779049Z traceback.print_stack() 2023-01-11T22:54:21.2779392Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:54:21.2779637Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:54:21.2780050Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.2780862Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2781004Z File "", line 1, in 2023-01-11T22:54:21.2781217Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2781363Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2781548Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2781705Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2781922Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2782027Z self.run() 2023-01-11T22:54:21.2782231Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2782379Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2782732Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2782866Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2783211Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2783339Z getattr(self, test_name)() 2023-01-11T22:54:21.2783701Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2783804Z fn() 2023-01-11T22:54:21.2784172Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2784297Z test(self, **param_kwargs) 2023-01-11T22:54:21.2784654Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2784779Z return func(*args, **kwargs) 2023-01-11T22:54:21.2785012Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2785130Z self.run_subtests( 2023-01-11T22:54:21.2785486Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2785649Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2786012Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2786169Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2786545Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2786665Z output = model(*input) 2023-01-11T22:54:21.2786975Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2787114Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2787496Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2787674Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2788043Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2788165Z _lazy_init(state, module) 2023-01-11T22:54:21.2788523Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2788761Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2789147Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2789294Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2789683Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2789817Z return func(*args, **kwargs) 2023-01-11T22:54:21.2790204Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2790311Z p_assert( 2023-01-11T22:54:21.2790648Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2790776Z traceback.print_stack() 2023-01-11T22:54:21.2791164Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.2791913Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2792051Z File "", line 1, in 2023-01-11T22:54:21.2792267Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2792416Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2792620Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2792773Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2792987Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2793078Z self.run() 2023-01-11T22:54:21.2793282Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2793431Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2793775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2793911Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2794279Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2794405Z getattr(self, test_name)() 2023-01-11T22:54:21.2794764Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2794845Z fn() 2023-01-11T22:54:21.2795212Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2795337Z test(self, **param_kwargs) 2023-01-11T22:54:21.2795704Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2795831Z return func(*args, **kwargs) 2023-01-11T22:54:21.2796080Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2796196Z self.run_subtests( 2023-01-11T22:54:21.2796551Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2796696Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2797059Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2797215Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2797590Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2797776Z output = model(*input) 2023-01-11T22:54:21.2798112Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2798253Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2798631Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2798790Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2799207Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2799338Z _lazy_init(state, module) 2023-01-11T22:54:21.2799701Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2799872Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2800270Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2800420Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2800758Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2800868Z return func(*args, **kwargs) 2023-01-11T22:54:21.2801248Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2801356Z p_assert( 2023-01-11T22:54:21.2801697Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2801826Z traceback.print_stack() 2023-01-11T22:54:21.2802074Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:54:21.2802320Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:54:21.2802721Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.2803475Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2803611Z File "", line 1, in 2023-01-11T22:54:21.2803807Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2803949Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2804153Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2804306Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2804520Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2804629Z self.run() 2023-01-11T22:54:21.2804832Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2804962Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2805307Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2805441Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2805807Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2805931Z getattr(self, test_name)() 2023-01-11T22:54:21.2806294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2806393Z fn() 2023-01-11T22:54:21.2806758Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2806865Z test(self, **param_kwargs) 2023-01-11T22:54:21.2807296Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2807424Z return func(*args, **kwargs) 2023-01-11T22:54:21.2807675Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2807791Z self.run_subtests( 2023-01-11T22:54:21.2808191Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2808361Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2808731Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2808868Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2809244Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2809372Z output = model(*input) 2023-01-11T22:54:21.2809701Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2809840Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2810218Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2810394Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2810767Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2810871Z _lazy_init(state, module) 2023-01-11T22:54:21.2811224Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2811396Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2811794Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2811942Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2812282Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2812409Z return func(*args, **kwargs) 2023-01-11T22:54:21.2812788Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2813037Z p_assert( 2023-01-11T22:54:21.2813391Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2813521Z traceback.print_stack() 2023-01-11T22:54:21.2813922Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.2814669Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2814806Z File "", line 1, in 2023-01-11T22:54:21.2815017Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2815162Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2815371Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2815505Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2815719Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2815828Z self.run() 2023-01-11T22:54:21.2816032Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2816181Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2816525Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2816795Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2817148Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2817273Z getattr(self, test_name)() 2023-01-11T22:54:21.2817633Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2817799Z fn() 2023-01-11T22:54:21.2818186Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2818311Z test(self, **param_kwargs) 2023-01-11T22:54:21.2818671Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2818795Z return func(*args, **kwargs) 2023-01-11T22:54:21.2819024Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2819144Z self.run_subtests( 2023-01-11T22:54:21.2819504Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2819667Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2820034Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2820188Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2820562Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2820685Z output = model(*input) 2023-01-11T22:54:21.2820995Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2821135Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2821518Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2821697Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2822065Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2822188Z _lazy_init(state, module) 2023-01-11T22:54:21.2822546Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2822715Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2823116Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2823243Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2823584Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2823719Z return func(*args, **kwargs) 2023-01-11T22:54:21.2824099Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2824204Z p_assert( 2023-01-11T22:54:21.2824541Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2824669Z traceback.print_stack() 2023-01-11T22:54:21.2824901Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:54:21.2825146Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:54:21.2825548Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.2826304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2826504Z File "", line 1, in 2023-01-11T22:54:21.2826719Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2826863Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2827111Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2827268Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2827467Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2827574Z self.run() 2023-01-11T22:54:21.2827783Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2827931Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2828285Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2828424Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2828788Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2828913Z getattr(self, test_name)() 2023-01-11T22:54:21.2829257Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2829362Z fn() 2023-01-11T22:54:21.2829729Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2829855Z test(self, **param_kwargs) 2023-01-11T22:54:21.2830212Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2830338Z return func(*args, **kwargs) 2023-01-11T22:54:21.2830587Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2830706Z self.run_subtests( 2023-01-11T22:54:21.2831044Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2831206Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2831575Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2831732Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2832110Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2832232Z output = model(*input) 2023-01-11T22:54:21.2832561Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2832701Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2833063Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2833239Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2833608Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2833731Z _lazy_init(state, module) 2023-01-11T22:54:21.2834087Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2834258Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2834655Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2834799Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2835117Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2835309Z return func(*args, **kwargs) 2023-01-11T22:54:21.2835700Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2835806Z p_assert( 2023-01-11T22:54:21.2836144Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2836273Z traceback.print_stack() 2023-01-11T22:54:21.2836719Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.2837483Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2837618Z File "", line 1, in 2023-01-11T22:54:21.2837816Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2837964Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2838170Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2838321Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2838535Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2838645Z self.run() 2023-01-11T22:54:21.2838850Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2838998Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2839326Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2839462Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2839831Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2839961Z getattr(self, test_name)() 2023-01-11T22:54:21.2840322Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2840423Z fn() 2023-01-11T22:54:21.2840789Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2840914Z test(self, **param_kwargs) 2023-01-11T22:54:21.2841257Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2841385Z return func(*args, **kwargs) 2023-01-11T22:54:21.2841634Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2841748Z self.run_subtests( 2023-01-11T22:54:21.2842106Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2842273Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2842637Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2842793Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2843150Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2843276Z output = model(*input) 2023-01-11T22:54:21.2843606Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2843746Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2844124Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2844300Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2844668Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2844853Z _lazy_init(state, module) 2023-01-11T22:54:21.2845197Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2845367Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2845817Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2845968Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2846313Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2846441Z return func(*args, **kwargs) 2023-01-11T22:54:21.2846819Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2846932Z p_assert( 2023-01-11T22:54:21.2847252Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2847381Z traceback.print_stack() 2023-01-11T22:54:21.2847629Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:54:21.2847875Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:54:21.2848282Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.2849029Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2849163Z File "", line 1, in 2023-01-11T22:54:21.2849381Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2849524Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2849712Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2849865Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2850080Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2850188Z self.run() 2023-01-11T22:54:21.2850395Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2850543Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2850891Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2851008Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2851374Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2851502Z getattr(self, test_name)() 2023-01-11T22:54:21.2851866Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2851966Z fn() 2023-01-11T22:54:21.2852334Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2852460Z test(self, **param_kwargs) 2023-01-11T22:54:21.2852821Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2853093Z return func(*args, **kwargs) 2023-01-11T22:54:21.2853348Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2853465Z self.run_subtests( 2023-01-11T22:54:21.2853827Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2854075Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2854447Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2854605Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2854982Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2855145Z output = model(*input) 2023-01-11T22:54:21.2855493Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2855633Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2856017Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2856193Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2856562Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2856689Z _lazy_init(state, module) 2023-01-11T22:54:21.2857047Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2857200Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2857600Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2857746Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2858084Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2858211Z return func(*args, **kwargs) 2023-01-11T22:54:21.2858589Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2858694Z p_assert( 2023-01-11T22:54:21.2859037Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2859147Z traceback.print_stack() 2023-01-11T22:54:21.2859550Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.2860303Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2860438Z File "", line 1, in 2023-01-11T22:54:21.2860651Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2860796Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2861000Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2861158Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2861373Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2861461Z self.run() 2023-01-11T22:54:21.2861665Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2861813Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2862160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2862296Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2862659Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2862784Z getattr(self, test_name)() 2023-01-11T22:54:21.2863145Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2863293Z fn() 2023-01-11T22:54:21.2863703Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2863831Z test(self, **param_kwargs) 2023-01-11T22:54:21.2864189Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2864315Z return func(*args, **kwargs) 2023-01-11T22:54:21.2864613Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2864736Z self.run_subtests( 2023-01-11T22:54:21.2865100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2865247Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2865612Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2865774Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2866151Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2866274Z output = model(*input) 2023-01-11T22:54:21.2866602Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2866746Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2867127Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2867287Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2867657Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2867783Z _lazy_init(state, module) 2023-01-11T22:54:21.2868138Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2868312Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2868712Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2868858Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2869201Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2869310Z return func(*args, **kwargs) 2023-01-11T22:54:21.2869688Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2869792Z p_assert( 2023-01-11T22:54:21.2870133Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2870262Z traceback.print_stack() 2023-01-11T22:54:21.2870510Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:54:21.2870762Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:54:21.2871164Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.2871922Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2872038Z File "", line 1, in 2023-01-11T22:54:21.2872250Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2872399Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2872604Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2872822Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2873040Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2873147Z self.run() 2023-01-11T22:54:21.2873353Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2873484Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2873889Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2874033Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2874403Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2874529Z getattr(self, test_name)() 2023-01-11T22:54:21.2874893Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2874998Z fn() 2023-01-11T22:54:21.2875348Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2875474Z test(self, **param_kwargs) 2023-01-11T22:54:21.2875833Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2875960Z return func(*args, **kwargs) 2023-01-11T22:54:21.2876214Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2876330Z self.run_subtests( 2023-01-11T22:54:21.2876686Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2876851Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2877196Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2877355Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2877730Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2877852Z output = model(*input) 2023-01-11T22:54:21.2878179Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2878320Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2878703Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2878881Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2879252Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2879357Z _lazy_init(state, module) 2023-01-11T22:54:21.2879713Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2879888Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2880289Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2880433Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2880771Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2880902Z return func(*args, **kwargs) 2023-01-11T22:54:21.2881280Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2881366Z p_assert( 2023-01-11T22:54:21.2881705Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2881833Z traceback.print_stack() 2023-01-11T22:54:21.2882236Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.2883068Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2883202Z File "", line 1, in 2023-01-11T22:54:21.2883460Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2883612Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2883799Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2883952Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2884168Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2884273Z self.run() 2023-01-11T22:54:21.2884482Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2884630Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2884979Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2885115Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2885467Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2885592Z getattr(self, test_name)() 2023-01-11T22:54:21.2885952Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2886051Z fn() 2023-01-11T22:54:21.2886418Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2886542Z test(self, **param_kwargs) 2023-01-11T22:54:21.2886902Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2887032Z return func(*args, **kwargs) 2023-01-11T22:54:21.2887264Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2887379Z self.run_subtests( 2023-01-11T22:54:21.2887734Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2887900Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2888267Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2888421Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2888797Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2888919Z output = model(*input) 2023-01-11T22:54:21.2889236Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2889375Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2889756Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2889933Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2890303Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2890426Z _lazy_init(state, module) 2023-01-11T22:54:21.2890781Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2890951Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2891331Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2891541Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2891888Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2892015Z return func(*args, **kwargs) 2023-01-11T22:54:21.2892394Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2892497Z p_assert( 2023-01-11T22:54:21.2893054Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2893204Z traceback.print_stack() 2023-01-11T22:54:21.2893437Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:54:21.2893681Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:54:21.2894099Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.2894856Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2894992Z File "", line 1, in 2023-01-11T22:54:21.2895210Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2895355Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2895560Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2895714Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2895913Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2896017Z self.run() 2023-01-11T22:54:21.2896228Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2896375Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2896724Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2896859Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2897226Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2897353Z getattr(self, test_name)() 2023-01-11T22:54:21.2897700Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2897799Z fn() 2023-01-11T22:54:21.2898167Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2898293Z test(self, **param_kwargs) 2023-01-11T22:54:21.2898652Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2898784Z return func(*args, **kwargs) 2023-01-11T22:54:21.2899035Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2899133Z self.run_subtests( 2023-01-11T22:54:21.2899490Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2899656Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2900019Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2900175Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2900551Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2900674Z output = model(*input) 2023-01-11T22:54:21.2901101Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2901241Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2901601Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2901777Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2902205Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2902339Z _lazy_init(state, module) 2023-01-11T22:54:21.2902704Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2902875Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2903274Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2903424Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2903744Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2903871Z return func(*args, **kwargs) 2023-01-11T22:54:21.2904251Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2904357Z p_assert( 2023-01-11T22:54:21.2904704Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2904833Z traceback.print_stack() 2023-01-11T22:54:21.2905235Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.2905983Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2906121Z File "", line 1, in 2023-01-11T22:54:21.2906316Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2906461Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2906681Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2906840Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2907057Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2907144Z self.run() 2023-01-11T22:54:21.2907349Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2907497Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2907843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2907981Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2908349Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2908475Z getattr(self, test_name)() 2023-01-11T22:54:21.2908835Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2908919Z fn() 2023-01-11T22:54:21.2909290Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2909415Z test(self, **param_kwargs) 2023-01-11T22:54:21.2909776Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2909903Z return func(*args, **kwargs) 2023-01-11T22:54:21.2910153Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2910335Z self.run_subtests( 2023-01-11T22:54:21.2910694Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2910840Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2911206Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2911410Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2911801Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2911921Z output = model(*input) 2023-01-11T22:54:21.2912249Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2912388Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2912765Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2912927Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2913298Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2913427Z _lazy_init(state, module) 2023-01-11T22:54:21.2913782Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2913952Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2914350Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2914495Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2914835Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2914948Z return func(*args, **kwargs) 2023-01-11T22:54:21.2915331Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2915437Z p_assert( 2023-01-11T22:54:21.2915775Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2915902Z traceback.print_stack() 2023-01-11T22:54:21.2916154Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:54:21.2916401Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:54:21.2916803Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.2917550Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2917672Z File "", line 1, in 2023-01-11T22:54:21.2917884Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2918030Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2918238Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2918389Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2918605Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2918711Z self.run() 2023-01-11T22:54:21.2918916Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2919045Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2919390Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2919588Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2919960Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2920087Z getattr(self, test_name)() 2023-01-11T22:54:21.2920449Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2920549Z fn() 2023-01-11T22:54:21.2920947Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2921083Z test(self, **param_kwargs) 2023-01-11T22:54:21.2921449Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2921575Z return func(*args, **kwargs) 2023-01-11T22:54:21.2921825Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2921944Z self.run_subtests( 2023-01-11T22:54:21.2922300Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2922462Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2922806Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2922965Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2923347Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2923469Z output = model(*input) 2023-01-11T22:54:21.2923800Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2923940Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2924318Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2924497Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2924868Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2924973Z _lazy_init(state, module) 2023-01-11T22:54:21.2925330Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2925502Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2925904Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2926050Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2926389Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2926521Z return func(*args, **kwargs) 2023-01-11T22:54:21.2926902Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2926989Z p_assert( 2023-01-11T22:54:21.2927331Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2927460Z traceback.print_stack() 2023-01-11T22:54:21.2927866Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.2928612Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2928743Z File "", line 1, in 2023-01-11T22:54:21.2928954Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2929158Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2929344Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2929498Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2929716Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2929824Z self.run() 2023-01-11T22:54:21.2930079Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2930232Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2930584Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2930720Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2931068Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2931197Z getattr(self, test_name)() 2023-01-11T22:54:21.2931562Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2931663Z fn() 2023-01-11T22:54:21.2932032Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2932156Z test(self, **param_kwargs) 2023-01-11T22:54:21.2932516Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2932646Z return func(*args, **kwargs) 2023-01-11T22:54:21.2933034Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2933157Z self.run_subtests( 2023-01-11T22:54:21.2933520Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2933689Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2934055Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2934210Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2934587Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2934706Z output = model(*input) 2023-01-11T22:54:21.2935019Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2935160Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2935542Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2935719Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2936090Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2936219Z _lazy_init(state, module) 2023-01-11T22:54:21.2936571Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2936741Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2937126Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2937273Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2937614Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2937741Z return func(*args, **kwargs) 2023-01-11T22:54:21.2938123Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2938227Z p_assert( 2023-01-11T22:54:21.2938666Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2938797Z traceback.print_stack() 2023-01-11T22:54:21.2939028Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:54:21.2939265Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:54:21.2939730Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.2940496Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2941238Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2941988Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2942732Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2943468Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2943605Z File "", line 1, in 2023-01-11T22:54:21.2943822Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2943969Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2944179Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2944332Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2944530Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2944638Z self.run() 2023-01-11T22:54:21.2944847Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2944998Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2945348Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2945519Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2945888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2945995Z getattr(self, test_name)() 2023-01-11T22:54:21.2946362Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2946461Z fn() 2023-01-11T22:54:21.2946830Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2946957Z test(self, **param_kwargs) 2023-01-11T22:54:21.2947311Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2947436Z return func(*args, **kwargs) 2023-01-11T22:54:21.2947775Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2947871Z self.run_subtests( 2023-01-11T22:54:21.2948236Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2948401Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2948818Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2948981Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2949366Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2949490Z output = model(*input) 2023-01-11T22:54:21.2949819Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2949942Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2950330Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2950509Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2950880Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2951002Z _lazy_init(state, module) 2023-01-11T22:54:21.2951361Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2951532Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2951933Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2952080Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2952401Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2952531Z return func(*args, **kwargs) 2023-01-11T22:54:21.2952916Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2953020Z p_assert( 2023-01-11T22:54:21.2953357Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2953484Z traceback.print_stack() 2023-01-11T22:54:21.2953891Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.2954644Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2955395Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2956138Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2956879Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2957611Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2957788Z File "", line 1, in 2023-01-11T22:54:21.2958001Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2958227Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2958439Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2958593Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2958809Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2958916Z self.run() 2023-01-11T22:54:21.2959122Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2959256Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2959609Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2959745Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2960110Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2960236Z getattr(self, test_name)() 2023-01-11T22:54:21.2960601Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2960701Z fn() 2023-01-11T22:54:21.2961052Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2961179Z test(self, **param_kwargs) 2023-01-11T22:54:21.2961539Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2961672Z return func(*args, **kwargs) 2023-01-11T22:54:21.2961920Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2962035Z self.run_subtests( 2023-01-11T22:54:21.2962392Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2962556Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2962902Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2963057Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2963435Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2963557Z output = model(*input) 2023-01-11T22:54:21.2963909Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2964054Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2964440Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2964619Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2964989Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2965098Z _lazy_init(state, module) 2023-01-11T22:54:21.2965454Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2965624Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2966022Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2966166Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2966578Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2966706Z return func(*args, **kwargs) 2023-01-11T22:54:21.2967086Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2967173Z p_assert( 2023-01-11T22:54:21.2967569Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2967706Z traceback.print_stack() 2023-01-11T22:54:21.2967956Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:54:21.2968197Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:54:21.2968605Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.2969356Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2969494Z File "", line 1, in 2023-01-11T22:54:21.2969706Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2969836Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2970045Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2970198Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2970414Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2970520Z self.run() 2023-01-11T22:54:21.2970728Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2970882Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2971208Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2971342Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2971706Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2971832Z getattr(self, test_name)() 2023-01-11T22:54:21.2972196Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2972298Z fn() 2023-01-11T22:54:21.2972667Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2972793Z test(self, **param_kwargs) 2023-01-11T22:54:21.2973283Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2973416Z return func(*args, **kwargs) 2023-01-11T22:54:21.2973668Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2973783Z self.run_subtests( 2023-01-11T22:54:21.2974139Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2974305Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2974672Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2974826Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2975185Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2975309Z output = model(*input) 2023-01-11T22:54:21.2975637Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2975864Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2976251Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2976430Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2976801Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2976987Z _lazy_init(state, module) 2023-01-11T22:54:21.2977340Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2977513Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2977911Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2978057Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2978401Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2978528Z return func(*args, **kwargs) 2023-01-11T22:54:21.2978907Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2979012Z p_assert( 2023-01-11T22:54:21.2979336Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2979468Z traceback.print_stack() 2023-01-11T22:54:21.2979870Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.2980620Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2980756Z File "", line 1, in 2023-01-11T22:54:21.2980972Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2981118Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2981329Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2981482Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2981682Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2981786Z self.run() 2023-01-11T22:54:21.2981992Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2982141Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2982484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2982620Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2982994Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2983118Z getattr(self, test_name)() 2023-01-11T22:54:21.2983463Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2983565Z fn() 2023-01-11T22:54:21.2983935Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2984063Z test(self, **param_kwargs) 2023-01-11T22:54:21.2984421Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2984545Z return func(*args, **kwargs) 2023-01-11T22:54:21.2984791Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2984905Z self.run_subtests( 2023-01-11T22:54:21.2985314Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2985479Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2985843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2985998Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2986422Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2986548Z output = model(*input) 2023-01-11T22:54:21.2986884Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2987025Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2987386Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2987567Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2987935Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2988059Z _lazy_init(state, module) 2023-01-11T22:54:21.2988412Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.2988585Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.2988985Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.2989133Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.2989454Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.2989583Z return func(*args, **kwargs) 2023-01-11T22:54:21.2989965Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.2990075Z p_assert( 2023-01-11T22:54:21.2990416Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.2990545Z traceback.print_stack() 2023-01-11T22:54:21.2990793Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:54:21.2991037Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:54:21.2991417Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.2992167Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.2992306Z File "", line 1, in 2023-01-11T22:54:21.2992519Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.2992667Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.2992873Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.2993027Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.2993244Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.2993352Z self.run() 2023-01-11T22:54:21.2993540Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.2993689Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.2994035Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.2994171Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.2994606Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.2994731Z getattr(self, test_name)() 2023-01-11T22:54:21.2995094Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.2995176Z fn() 2023-01-11T22:54:21.2995591Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.2995722Z test(self, **param_kwargs) 2023-01-11T22:54:21.2996087Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.2996213Z return func(*args, **kwargs) 2023-01-11T22:54:21.2996462Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.2996576Z self.run_subtests( 2023-01-11T22:54:21.2996935Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.2997080Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.2997442Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.2997602Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.2997980Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.2998102Z output = model(*input) 2023-01-11T22:54:21.2998428Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.2998570Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.2998950Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.2999114Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.2999484Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.2999606Z _lazy_init(state, module) 2023-01-11T22:54:21.2999963Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3000131Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3000533Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3000680Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3001023Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3001149Z return func(*args, **kwargs) 2023-01-11T22:54:21.3001511Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3001620Z p_assert( 2023-01-11T22:54:21.3001957Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3002083Z traceback.print_stack() 2023-01-11T22:54:21.3002487Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.3003241Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3003374Z File "", line 1, in 2023-01-11T22:54:21.3003588Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3003716Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3003989Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3004141Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3004356Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3004462Z self.run() 2023-01-11T22:54:21.3004665Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3004854Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3005213Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3005332Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3005695Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3005821Z getattr(self, test_name)() 2023-01-11T22:54:21.3006181Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3006287Z fn() 2023-01-11T22:54:21.3006654Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3006779Z test(self, **param_kwargs) 2023-01-11T22:54:21.3007136Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3007246Z return func(*args, **kwargs) 2023-01-11T22:54:21.3007496Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3007612Z self.run_subtests( 2023-01-11T22:54:21.3007963Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3008126Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3008491Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3008650Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3009027Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3009131Z output = model(*input) 2023-01-11T22:54:21.3009463Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3009606Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3009986Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3010161Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3010531Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3010656Z _lazy_init(state, module) 2023-01-11T22:54:21.3011018Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3011171Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3011570Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3011716Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3012057Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3012186Z return func(*args, **kwargs) 2023-01-11T22:54:21.3012567Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3012674Z p_assert( 2023-01-11T22:54:21.3013181Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3013400Z traceback.print_stack() 2023-01-11T22:54:21.3013651Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:54:21.3013895Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:54:21.3014305Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.3015117Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3015260Z File "", line 1, in 2023-01-11T22:54:21.3015473Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3015618Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3015828Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3015962Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3016181Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3016287Z self.run() 2023-01-11T22:54:21.3016494Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3016645Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3016995Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3017131Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3017498Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3017606Z getattr(self, test_name)() 2023-01-11T22:54:21.3017968Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3018070Z fn() 2023-01-11T22:54:21.3018436Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3018558Z test(self, **param_kwargs) 2023-01-11T22:54:21.3018920Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3019051Z return func(*args, **kwargs) 2023-01-11T22:54:21.3019282Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3019397Z self.run_subtests( 2023-01-11T22:54:21.3019756Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3019920Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3020287Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3020446Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3020824Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3020944Z output = model(*input) 2023-01-11T22:54:21.3021256Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3021398Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3021780Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3021956Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3022326Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3022451Z _lazy_init(state, module) 2023-01-11T22:54:21.3022879Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3023048Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3023448Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3023574Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3023964Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3024100Z return func(*args, **kwargs) 2023-01-11T22:54:21.3024487Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3024592Z p_assert( 2023-01-11T22:54:21.3024930Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3025063Z traceback.print_stack() 2023-01-11T22:54:21.3025465Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.3026200Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3026337Z File "", line 1, in 2023-01-11T22:54:21.3026551Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3026697Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3026904Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3027057Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3027271Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3027381Z self.run() 2023-01-11T22:54:21.3027568Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3027716Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3028061Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3028196Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3028565Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3028692Z getattr(self, test_name)() 2023-01-11T22:54:21.3029055Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3029154Z fn() 2023-01-11T22:54:21.3029503Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3029634Z test(self, **param_kwargs) 2023-01-11T22:54:21.3029992Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3030119Z return func(*args, **kwargs) 2023-01-11T22:54:21.3030368Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3030481Z self.run_subtests( 2023-01-11T22:54:21.3030840Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3031002Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3031347Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3031504Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3031882Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3032065Z output = model(*input) 2023-01-11T22:54:21.3032400Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3032540Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3032918Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3033142Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3033504Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3033629Z _lazy_init(state, module) 2023-01-11T22:54:21.3033985Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3034153Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3034558Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3034704Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3035041Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3035169Z return func(*args, **kwargs) 2023-01-11T22:54:21.3035533Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3035640Z p_assert( 2023-01-11T22:54:21.3035979Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3036110Z traceback.print_stack() 2023-01-11T22:54:21.3036359Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:54:21.3036598Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:54:21.3037009Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.3037763Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3038159Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.3038890Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3039135Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:54:21.3039352Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:54:21.3039750Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.3040151Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.3040396Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:54:21.3040635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:54:21.3041029Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.3041500Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.3041739Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:54:21.3041973Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:54:21.3042427Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.3042837Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.3043079Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:54:21.3043314Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:54:21.3043708Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.3044105Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.3044346Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:54:21.3044582Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:54:21.3044979Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.3045349Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.3045591Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:54:21.3045825Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:54:21.3046222Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.3046615Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.3047370Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3047617Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:54:21.3047856Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:54:21.3048250Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.3048648Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.3049398Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3050137Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3050877Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3051740Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3052486Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3053350Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3054080Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3054815Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3055545Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3056281Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3057010Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3057742Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3058479Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3059207Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3059934Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3060815Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3062070Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3062837Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3063573Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3064307Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3065060Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3065797Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3066528Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3067262Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3067998Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3068245Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:54:21.3068486Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:54:21.3068891Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.3069294Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.3070117Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3070891Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3071138Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:54:21.3071374Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:54:21.3071780Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.3072158Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.3072400Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:54:21.3072644Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:54:21.3073042Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.3073440Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.3073679Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:54:21.3073916Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:54:21.3074312Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.3074701Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.3074948Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 0 2023-01-11T22:54:21.3075166Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 1 2023-01-11T22:54:21.3075560Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:54:21.3075955Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:54:21.3076200Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 0 2023-01-11T22:54:21.3076442Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 1 2023-01-11T22:54:21.3076835Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:54:21.3077232Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:54:21.3077474Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 0 2023-01-11T22:54:21.3077710Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 1 2023-01-11T22:54:21.3078082Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:54:21.3078476Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:54:21.3078797Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 0 2023-01-11T22:54:21.3079031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 1 2023-01-11T22:54:21.3079432Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:54:21.3079874Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:54:21.3080118Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 0 2023-01-11T22:54:21.3080356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 1 2023-01-11T22:54:21.3080752Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:54:21.3081145Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:54:21.3081364Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 0 2023-01-11T22:54:21.3081598Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 1 2023-01-11T22:54:21.3081989Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:54:21.3082380Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:54:21.3082618Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 0 2023-01-11T22:54:21.3082852Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 1 2023-01-11T22:54:21.3083244Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:54:21.3083632Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:54:21.3083870Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 0 2023-01-11T22:54:21.3084088Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 1 2023-01-11T22:54:21.3084480Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:54:21.3084869Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:54:21.3085617Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3086369Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3087111Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3087850Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3088695Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3089441Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3090171Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3090909Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3091654Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3092384Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3093305Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3094046Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3094778Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3095512Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3096242Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3096971Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3097815Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3098606Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3099353Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3100085Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3100820Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3101548Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3102282Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3103011Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3103737Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3104467Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3105198Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3105924Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3106751Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3107529Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3108271Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3109003Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3109255Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 0 2023-01-11T22:54:21.3109500Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 1 2023-01-11T22:54:21.3109906Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:54:21.3110282Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:54:21.3111020Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3111748Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3111994Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 0 2023-01-11T22:54:21.3112229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 1 2023-01-11T22:54:21.3112629Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:54:21.3113359Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3113754Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:54:21.3114476Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3114716Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 0 2023-01-11T22:54:21.3115018Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 1 2023-01-11T22:54:21.3115415Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:54:21.3116193Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3116593Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:54:21.3117317Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3117543Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 0 2023-01-11T22:54:21.3117777Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 1 2023-01-11T22:54:21.3118169Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:54:21.3118893Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3119274Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:54:21.3119997Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3120111Z dist init r=1, world=2 2023-01-11T22:54:21.3120447Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3120766Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3121077Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3121385Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3121693Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3121983Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3122287Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3122592Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3122894Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3123258Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3123639Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3123986Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3124104Z dist init r=0, world=2 2023-01-11T22:54:21.3124430Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3124746Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3125059Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3125347Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3125655Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3125961Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3126261Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3126567Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3126870Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3127175Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3127478Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3127783Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3127889Z ok (7.018s) 2023-01-11T22:54:21.3128230Z test_mixture_of_experts_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85594 2023-01-11T22:54:21.3128435Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85595 2023-01-11T22:54:21.3128822Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.3129006Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.3129392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.3129586Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.3129957Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.3130135Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.3130592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.3130787Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.3131015Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.3131261Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.3131714Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.3132128Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.3132360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.3132589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.3133790Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.3133910Z warnings.warn( 2023-01-11T22:54:21.3134934Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.3135052Z warnings.warn( 2023-01-11T22:54:21.3135297Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:54:21.3135523Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:54:21.3135921Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.3136462Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:54:21.3136574Z warnings.warn( 2023-01-11T22:54:21.3137324Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3137462Z File "", line 1, in 2023-01-11T22:54:21.3137682Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3137829Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3138036Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3138173Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3138390Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3138497Z self.run() 2023-01-11T22:54:21.3138701Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3138850Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3139198Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3139437Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3139811Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3139919Z getattr(self, test_name)() 2023-01-11T22:54:21.3140284Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3140386Z fn() 2023-01-11T22:54:21.3140845Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3140979Z test(self, **param_kwargs) 2023-01-11T22:54:21.3141344Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3141473Z return func(*args, **kwargs) 2023-01-11T22:54:21.3141724Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3141823Z self.run_subtests( 2023-01-11T22:54:21.3142190Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3142356Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3142720Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3142873Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3143255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3143379Z output = model(*input) 2023-01-11T22:54:21.3143714Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3143838Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3144220Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3144401Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3144771Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3144895Z _lazy_init(state, module) 2023-01-11T22:54:21.3145250Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3145423Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3145822Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3145950Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3146292Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3146421Z return func(*args, **kwargs) 2023-01-11T22:54:21.3146801Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3146906Z p_assert( 2023-01-11T22:54:21.3147247Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3147376Z traceback.print_stack() 2023-01-11T22:54:21.3147779Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.3148321Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:54:21.3148418Z warnings.warn( 2023-01-11T22:54:21.3149170Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3149373Z File "", line 1, in 2023-01-11T22:54:21.3149584Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3149730Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3149935Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3150131Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3150354Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3150442Z self.run() 2023-01-11T22:54:21.3150648Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3150799Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3151150Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3151291Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3151657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3151784Z getattr(self, test_name)() 2023-01-11T22:54:21.3152146Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3152228Z fn() 2023-01-11T22:54:21.3152603Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3152729Z test(self, **param_kwargs) 2023-01-11T22:54:21.3153088Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3153215Z return func(*args, **kwargs) 2023-01-11T22:54:21.3153463Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3153582Z self.run_subtests( 2023-01-11T22:54:21.3153940Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3154086Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3154450Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3154602Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3154982Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3155106Z output = model(*input) 2023-01-11T22:54:21.3155434Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3155577Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3155957Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3156119Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3156492Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3156616Z _lazy_init(state, module) 2023-01-11T22:54:21.3156969Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3157143Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3157545Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3157688Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3158029Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3158138Z return func(*args, **kwargs) 2023-01-11T22:54:21.3158592Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3158698Z p_assert( 2023-01-11T22:54:21.3159036Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3159165Z traceback.print_stack() 2023-01-11T22:54:21.3159412Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:54:21.3159703Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:54:21.3160118Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.3160871Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3160991Z File "", line 1, in 2023-01-11T22:54:21.3161206Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3161352Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3161558Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3161710Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3161927Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3162039Z self.run() 2023-01-11T22:54:21.3162245Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3162375Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3162721Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3162862Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3163229Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3163354Z getattr(self, test_name)() 2023-01-11T22:54:21.3163718Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3163819Z fn() 2023-01-11T22:54:21.3164172Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3164298Z test(self, **param_kwargs) 2023-01-11T22:54:21.3164657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3164785Z return func(*args, **kwargs) 2023-01-11T22:54:21.3165064Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3165186Z self.run_subtests( 2023-01-11T22:54:21.3165541Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3165706Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3166048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3166201Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3166579Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3166703Z output = model(*input) 2023-01-11T22:54:21.3167035Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3167175Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3167556Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3167799Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3168177Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3168282Z _lazy_init(state, module) 2023-01-11T22:54:21.3168635Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3168856Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3169274Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3169420Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3169760Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3169887Z return func(*args, **kwargs) 2023-01-11T22:54:21.3170264Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3170355Z p_assert( 2023-01-11T22:54:21.3170695Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3170823Z traceback.print_stack() 2023-01-11T22:54:21.3171230Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.3171981Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3172115Z File "", line 1, in 2023-01-11T22:54:21.3172328Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3172548Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3172733Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3173040Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3173263Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3173369Z self.run() 2023-01-11T22:54:21.3173581Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3173732Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3174077Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3174215Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3174559Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3174685Z getattr(self, test_name)() 2023-01-11T22:54:21.3175048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3175148Z fn() 2023-01-11T22:54:21.3175513Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3175637Z test(self, **param_kwargs) 2023-01-11T22:54:21.3176000Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3176127Z return func(*args, **kwargs) 2023-01-11T22:54:21.3176358Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3176474Z self.run_subtests( 2023-01-11T22:54:21.3176836Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3177000Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3177460Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3177617Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3177995Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3178118Z output = model(*input) 2023-01-11T22:54:21.3178483Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3178634Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3179027Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3179203Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3179573Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3179700Z _lazy_init(state, module) 2023-01-11T22:54:21.3180055Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3180225Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3180606Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3180757Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3181097Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3181225Z return func(*args, **kwargs) 2023-01-11T22:54:21.3181605Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3181709Z p_assert( 2023-01-11T22:54:21.3182047Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3182180Z traceback.print_stack() 2023-01-11T22:54:21.3182408Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:54:21.3182652Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:54:21.3183056Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.3183814Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3183948Z File "", line 1, in 2023-01-11T22:54:21.3184161Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3184312Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3184518Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3184672Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3184869Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3184975Z self.run() 2023-01-11T22:54:21.3185182Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3185332Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3185677Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3185813Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3186179Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3186305Z getattr(self, test_name)() 2023-01-11T22:54:21.3186727Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3186826Z fn() 2023-01-11T22:54:21.3187200Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3187326Z test(self, **param_kwargs) 2023-01-11T22:54:21.3187730Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3187862Z return func(*args, **kwargs) 2023-01-11T22:54:21.3188114Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3188229Z self.run_subtests( 2023-01-11T22:54:21.3188573Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3188740Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3189108Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3189260Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3189635Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3189757Z output = model(*input) 2023-01-11T22:54:21.3190089Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3190230Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3190595Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3190773Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3191145Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3191272Z _lazy_init(state, module) 2023-01-11T22:54:21.3191628Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3191799Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3192307Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3192457Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3192780Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3192908Z return func(*args, **kwargs) 2023-01-11T22:54:21.3193284Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3193387Z p_assert( 2023-01-11T22:54:21.3193726Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3193859Z traceback.print_stack() 2023-01-11T22:54:21.3194260Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.3195011Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3195143Z File "", line 1, in 2023-01-11T22:54:21.3195339Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3195483Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3195684Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3195835Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3196113Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3196221Z self.run() 2023-01-11T22:54:21.3196428Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3196559Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3196909Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3197091Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3197471Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3197598Z getattr(self, test_name)() 2023-01-11T22:54:21.3197959Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3198060Z fn() 2023-01-11T22:54:21.3198428Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3198541Z test(self, **param_kwargs) 2023-01-11T22:54:21.3198900Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3199029Z return func(*args, **kwargs) 2023-01-11T22:54:21.3199282Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3199399Z self.run_subtests( 2023-01-11T22:54:21.3199755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3199918Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3200282Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3200420Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3200797Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3200921Z output = model(*input) 2023-01-11T22:54:21.3201250Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3201390Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3201772Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3201952Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3202324Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3202429Z _lazy_init(state, module) 2023-01-11T22:54:21.3202789Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3202960Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3203364Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3203507Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3203849Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3203977Z return func(*args, **kwargs) 2023-01-11T22:54:21.3204358Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3204445Z p_assert( 2023-01-11T22:54:21.3204782Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3204912Z traceback.print_stack() 2023-01-11T22:54:21.3205159Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:54:21.3205405Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:54:21.3205896Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.3206695Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3206834Z File "", line 1, in 2023-01-11T22:54:21.3207054Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3207181Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3207386Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3207541Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3207760Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3207863Z self.run() 2023-01-11T22:54:21.3208067Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3208215Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3208566Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3208688Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3209054Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3209180Z getattr(self, test_name)() 2023-01-11T22:54:21.3209544Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3209645Z fn() 2023-01-11T22:54:21.3210011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3210140Z test(self, **param_kwargs) 2023-01-11T22:54:21.3210503Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3210614Z return func(*args, **kwargs) 2023-01-11T22:54:21.3210864Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3210983Z self.run_subtests( 2023-01-11T22:54:21.3211340Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3211507Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3211873Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3212029Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3212403Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3212512Z output = model(*input) 2023-01-11T22:54:21.3212842Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3213151Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3213540Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3213718Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3214089Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3214210Z _lazy_init(state, module) 2023-01-11T22:54:21.3214567Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3214718Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3215208Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3215354Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3215694Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3215821Z return func(*args, **kwargs) 2023-01-11T22:54:21.3216257Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3216369Z p_assert( 2023-01-11T22:54:21.3216714Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3216824Z traceback.print_stack() 2023-01-11T22:54:21.3217225Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.3217981Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3218116Z File "", line 1, in 2023-01-11T22:54:21.3218330Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3218479Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3218690Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3218841Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3219057Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3219146Z self.run() 2023-01-11T22:54:21.3219350Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3219499Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3219847Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3219983Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3220346Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3220472Z getattr(self, test_name)() 2023-01-11T22:54:21.3220816Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3220920Z fn() 2023-01-11T22:54:21.3221288Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3221415Z test(self, **param_kwargs) 2023-01-11T22:54:21.3221776Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3221907Z return func(*args, **kwargs) 2023-01-11T22:54:21.3222156Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3222271Z self.run_subtests( 2023-01-11T22:54:21.3222609Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3222774Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3223141Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3223296Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3223675Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3223792Z output = model(*input) 2023-01-11T22:54:21.3236823Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3237145Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3237594Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3237774Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3238151Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3238316Z _lazy_init(state, module) 2023-01-11T22:54:21.3238695Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3238867Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3239273Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3239418Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3239763Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3239891Z return func(*args, **kwargs) 2023-01-11T22:54:21.3240272Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3240360Z p_assert( 2023-01-11T22:54:21.3240703Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3240833Z traceback.print_stack() 2023-01-11T22:54:21.3241082Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:54:21.3241327Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:54:21.3241729Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.3242488Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3242621Z File "", line 1, in 2023-01-11T22:54:21.3242836Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3242966Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3243171Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3243322Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3243539Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3243643Z self.run() 2023-01-11T22:54:21.3243846Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3243998Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3244324Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3244460Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3244824Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3244948Z getattr(self, test_name)() 2023-01-11T22:54:21.3245312Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3245411Z fn() 2023-01-11T22:54:21.3245777Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3245947Z test(self, **param_kwargs) 2023-01-11T22:54:21.3246296Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3246493Z return func(*args, **kwargs) 2023-01-11T22:54:21.3246744Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3246860Z self.run_subtests( 2023-01-11T22:54:21.3247222Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3247386Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3247799Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3247959Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3248323Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3248446Z output = model(*input) 2023-01-11T22:54:21.3248775Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3248922Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3249300Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3249475Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3249843Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3249969Z _lazy_init(state, module) 2023-01-11T22:54:21.3250305Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3250474Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3250870Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3251014Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3251356Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3251482Z return func(*args, **kwargs) 2023-01-11T22:54:21.3251857Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3251960Z p_assert( 2023-01-11T22:54:21.3252302Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3252413Z traceback.print_stack() 2023-01-11T22:54:21.3252814Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.3253843Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3253984Z File "", line 1, in 2023-01-11T22:54:21.3254196Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3254339Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3254542Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3254693Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3254894Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3255001Z self.run() 2023-01-11T22:54:21.3255203Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3255349Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3255694Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3255827Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3256322Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3256446Z getattr(self, test_name)() 2023-01-11T22:54:21.3256789Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3256889Z fn() 2023-01-11T22:54:21.3257354Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3257489Z test(self, **param_kwargs) 2023-01-11T22:54:21.3257854Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3257980Z return func(*args, **kwargs) 2023-01-11T22:54:21.3258229Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3258344Z self.run_subtests( 2023-01-11T22:54:21.3258688Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3258854Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3259222Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3259375Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3259753Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3259874Z output = model(*input) 2023-01-11T22:54:21.3260202Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3260341Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3260701Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3260880Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3261251Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3261373Z _lazy_init(state, module) 2023-01-11T22:54:21.3261728Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3261897Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3262296Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3262441Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3262762Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3262890Z return func(*args, **kwargs) 2023-01-11T22:54:21.3263268Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3263376Z p_assert( 2023-01-11T22:54:21.3263712Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3263840Z traceback.print_stack() 2023-01-11T22:54:21.3264087Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:54:21.3264334Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:54:21.3264720Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.3265498Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3265708Z File "", line 1, in 2023-01-11T22:54:21.3265922Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3266066Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3266271Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3266423Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3266677Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3266790Z self.run() 2023-01-11T22:54:21.3266977Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3267124Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3267481Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3267614Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3267982Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3268107Z getattr(self, test_name)() 2023-01-11T22:54:21.3268472Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3268570Z fn() 2023-01-11T22:54:21.3268922Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3269047Z test(self, **param_kwargs) 2023-01-11T22:54:21.3269407Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3269533Z return func(*args, **kwargs) 2023-01-11T22:54:21.3269780Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3269894Z self.run_subtests( 2023-01-11T22:54:21.3270252Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3270399Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3270763Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3270916Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3271295Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3271417Z output = model(*input) 2023-01-11T22:54:21.3271746Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3271884Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3272263Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3272443Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3272792Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3272913Z _lazy_init(state, module) 2023-01-11T22:54:21.3273268Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3273439Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3273834Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3273978Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3274318Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3274443Z return func(*args, **kwargs) 2023-01-11T22:54:21.3274804Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3274969Z p_assert( 2023-01-11T22:54:21.3275314Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3275443Z traceback.print_stack() 2023-01-11T22:54:21.3275847Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.3276645Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3276786Z File "", line 1, in 2023-01-11T22:54:21.3277000Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3277152Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3277337Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3277488Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3277701Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3277806Z self.run() 2023-01-11T22:54:21.3278008Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3278159Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3278510Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3278628Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3278993Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3279118Z getattr(self, test_name)() 2023-01-11T22:54:21.3279478Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3279583Z fn() 2023-01-11T22:54:21.3279948Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3280072Z test(self, **param_kwargs) 2023-01-11T22:54:21.3280433Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3280545Z return func(*args, **kwargs) 2023-01-11T22:54:21.3280795Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3280909Z self.run_subtests( 2023-01-11T22:54:21.3281263Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3281425Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3281789Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3281949Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3282328Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3282432Z output = model(*input) 2023-01-11T22:54:21.3282762Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3282906Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3283284Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3283459Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3283825Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3283948Z _lazy_init(state, module) 2023-01-11T22:54:21.3284390Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3284544Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3284940Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3285084Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3285465Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3285598Z return func(*args, **kwargs) 2023-01-11T22:54:21.3285982Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3286086Z p_assert( 2023-01-11T22:54:21.3286424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3286540Z traceback.print_stack() 2023-01-11T22:54:21.3286787Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:54:21.3287031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:54:21.3287432Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.3288186Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3288320Z File "", line 1, in 2023-01-11T22:54:21.3288532Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3288678Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3288886Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3289022Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3289236Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3289341Z self.run() 2023-01-11T22:54:21.3289545Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3289694Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3290037Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3290172Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3290534Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3290643Z getattr(self, test_name)() 2023-01-11T22:54:21.3291004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3291108Z fn() 2023-01-11T22:54:21.3291477Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3291601Z test(self, **param_kwargs) 2023-01-11T22:54:21.3291957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3292087Z return func(*args, **kwargs) 2023-01-11T22:54:21.3292336Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3292433Z self.run_subtests( 2023-01-11T22:54:21.3292788Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3293148Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3293518Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3293762Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3294144Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3294266Z output = model(*input) 2023-01-11T22:54:21.3294650Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3294781Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3295166Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3295343Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3295711Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3295832Z _lazy_init(state, module) 2023-01-11T22:54:21.3296190Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3296361Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3296761Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3296888Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3297230Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3297355Z return func(*args, **kwargs) 2023-01-11T22:54:21.3297735Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3297838Z p_assert( 2023-01-11T22:54:21.3298173Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3298305Z traceback.print_stack() 2023-01-11T22:54:21.3298702Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.3299438Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3299572Z File "", line 1, in 2023-01-11T22:54:21.3299783Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3299928Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3300135Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3300285Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3300500Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3300609Z self.run() 2023-01-11T22:54:21.3300796Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3300943Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3301289Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3301424Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3301789Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3301913Z getattr(self, test_name)() 2023-01-11T22:54:21.3302278Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3302376Z fn() 2023-01-11T22:54:21.3302724Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3302918Z test(self, **param_kwargs) 2023-01-11T22:54:21.3303288Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3303415Z return func(*args, **kwargs) 2023-01-11T22:54:21.3303666Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3303779Z self.run_subtests( 2023-01-11T22:54:21.3304181Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3304352Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3304704Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3304858Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3305230Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3305356Z output = model(*input) 2023-01-11T22:54:21.3305685Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3305824Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3306204Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3306385Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3306737Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3306861Z _lazy_init(state, module) 2023-01-11T22:54:21.3307213Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3307383Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3307784Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3307928Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3308266Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3308392Z return func(*args, **kwargs) 2023-01-11T22:54:21.3308756Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3308860Z p_assert( 2023-01-11T22:54:21.3309198Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3309326Z traceback.print_stack() 2023-01-11T22:54:21.3309571Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:54:21.3309815Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:54:21.3310222Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.3310974Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3311106Z File "", line 1, in 2023-01-11T22:54:21.3311303Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3311450Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3311656Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3311808Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3312021Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3312192Z self.run() 2023-01-11T22:54:21.3312398Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3312545Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3312877Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3313013Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3313429Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3313561Z getattr(self, test_name)() 2023-01-11T22:54:21.3313930Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3314030Z fn() 2023-01-11T22:54:21.3314395Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3314525Z test(self, **param_kwargs) 2023-01-11T22:54:21.3314866Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3314992Z return func(*args, **kwargs) 2023-01-11T22:54:21.3315240Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3315354Z self.run_subtests( 2023-01-11T22:54:21.3315712Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3315876Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3316239Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3316393Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3316750Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3316878Z output = model(*input) 2023-01-11T22:54:21.3317207Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3317345Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3317721Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3317901Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3318270Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3318392Z _lazy_init(state, module) 2023-01-11T22:54:21.3318730Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3318898Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3319301Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3319447Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3319786Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3319910Z return func(*args, **kwargs) 2023-01-11T22:54:21.3320290Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3320393Z p_assert( 2023-01-11T22:54:21.3320712Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3320838Z traceback.print_stack() 2023-01-11T22:54:21.3321239Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.3321988Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3322181Z File "", line 1, in 2023-01-11T22:54:21.3322395Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3322580Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3322789Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3322942Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3323140Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3323245Z self.run() 2023-01-11T22:54:21.3323451Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3323598Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3323953Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3324087Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3324450Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3324558Z getattr(self, test_name)() 2023-01-11T22:54:21.3324923Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3325022Z fn() 2023-01-11T22:54:21.3325389Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3325513Z test(self, **param_kwargs) 2023-01-11T22:54:21.3325871Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3325996Z return func(*args, **kwargs) 2023-01-11T22:54:21.3326247Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3326343Z self.run_subtests( 2023-01-11T22:54:21.3326698Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3326861Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3327232Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3327386Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3327761Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3327883Z output = model(*input) 2023-01-11T22:54:21.3328209Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3328336Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3328716Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3328890Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3329260Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3329384Z _lazy_init(state, module) 2023-01-11T22:54:21.3329739Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3329908Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3330307Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3330450Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3330770Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3330961Z return func(*args, **kwargs) 2023-01-11T22:54:21.3331347Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3331450Z p_assert( 2023-01-11T22:54:21.3331786Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3331960Z traceback.print_stack() 2023-01-11T22:54:21.3332214Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:54:21.3332454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:54:21.3332843Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.3333778Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3334533Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3335275Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3336021Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3336763Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3336900Z File "", line 1, in 2023-01-11T22:54:21.3337116Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3337258Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3337465Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3337616Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3337835Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3337924Z self.run() 2023-01-11T22:54:21.3338131Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3338276Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3338624Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3338761Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3339126Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3339250Z getattr(self, test_name)() 2023-01-11T22:54:21.3339612Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3339695Z fn() 2023-01-11T22:54:21.3340061Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3340276Z test(self, **param_kwargs) 2023-01-11T22:54:21.3340640Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3340766Z return func(*args, **kwargs) 2023-01-11T22:54:21.3341014Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3341191Z self.run_subtests( 2023-01-11T22:54:21.3341569Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3341716Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3342078Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3342235Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3342616Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3342738Z output = model(*input) 2023-01-11T22:54:21.3343067Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3343206Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3343586Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3343747Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3344116Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3344237Z _lazy_init(state, module) 2023-01-11T22:54:21.3344592Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3344767Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3345162Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3345305Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3345644Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3345753Z return func(*args, **kwargs) 2023-01-11T22:54:21.3346130Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3346233Z p_assert( 2023-01-11T22:54:21.3346573Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3346700Z traceback.print_stack() 2023-01-11T22:54:21.3347097Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.3347851Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3348595Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3349337Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3350169Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3350959Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3351095Z File "", line 1, in 2023-01-11T22:54:21.3351294Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3351439Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3351645Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3351805Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3352020Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3352125Z self.run() 2023-01-11T22:54:21.3352329Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3352478Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3352817Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3352952Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3353315Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3353440Z getattr(self, test_name)() 2023-01-11T22:54:21.3353799Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3353901Z fn() 2023-01-11T22:54:21.3354265Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3354389Z test(self, **param_kwargs) 2023-01-11T22:54:21.3354730Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3354855Z return func(*args, **kwargs) 2023-01-11T22:54:21.3355110Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3355225Z self.run_subtests( 2023-01-11T22:54:21.3355579Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3355741Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3356102Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3356258Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3356616Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3356740Z output = model(*input) 2023-01-11T22:54:21.3357067Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3357207Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3357591Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3357767Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3358139Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3358261Z _lazy_init(state, module) 2023-01-11T22:54:21.3358599Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3358831Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3359235Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3359379Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3359714Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3359889Z return func(*args, **kwargs) 2023-01-11T22:54:21.3360281Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3360386Z p_assert( 2023-01-11T22:54:21.3360705Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3360831Z traceback.print_stack() 2023-01-11T22:54:21.3361078Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:54:21.3361321Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:54:21.3361718Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.3362469Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3362603Z File "", line 1, in 2023-01-11T22:54:21.3362816Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3362959Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3363148Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3363303Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3363517Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3363622Z self.run() 2023-01-11T22:54:21.3363826Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3363973Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3364321Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3364456Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3364806Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3364931Z getattr(self, test_name)() 2023-01-11T22:54:21.3365292Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3365395Z fn() 2023-01-11T22:54:21.3365760Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3365883Z test(self, **param_kwargs) 2023-01-11T22:54:21.3366270Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3366396Z return func(*args, **kwargs) 2023-01-11T22:54:21.3366629Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3366742Z self.run_subtests( 2023-01-11T22:54:21.3367098Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3367262Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3367622Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3367839Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3368219Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3368339Z output = model(*input) 2023-01-11T22:54:21.3368649Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3368789Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3369218Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3369406Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3369779Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3369901Z _lazy_init(state, module) 2023-01-11T22:54:21.3370253Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3370424Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3370808Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3370951Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3371290Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3371420Z return func(*args, **kwargs) 2023-01-11T22:54:21.3371795Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3371896Z p_assert( 2023-01-11T22:54:21.3372230Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3372357Z traceback.print_stack() 2023-01-11T22:54:21.3372740Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.3373669Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3373804Z File "", line 1, in 2023-01-11T22:54:21.3374024Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3374167Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3374373Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3374526Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3374739Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3374826Z self.run() 2023-01-11T22:54:21.3375034Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3375179Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3375525Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3375661Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3376027Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3376152Z getattr(self, test_name)() 2023-01-11T22:54:21.3376512Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3376595Z fn() 2023-01-11T22:54:21.3376960Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3377085Z test(self, **param_kwargs) 2023-01-11T22:54:21.3377444Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3377658Z return func(*args, **kwargs) 2023-01-11T22:54:21.3377907Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3378023Z self.run_subtests( 2023-01-11T22:54:21.3378383Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3378587Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3378964Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3379119Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3379492Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3379613Z output = model(*input) 2023-01-11T22:54:21.3379944Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3380082Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3380460Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3380617Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3380987Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3381110Z _lazy_init(state, module) 2023-01-11T22:54:21.3381467Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3381639Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3382037Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3382185Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3382523Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3382633Z return func(*args, **kwargs) 2023-01-11T22:54:21.3383009Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3383112Z p_assert( 2023-01-11T22:54:21.3383452Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3383580Z traceback.print_stack() 2023-01-11T22:54:21.3383828Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:54:21.3384065Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:54:21.3384464Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.3385221Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3385338Z File "", line 1, in 2023-01-11T22:54:21.3385556Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3385700Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3385904Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3386056Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3386269Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3386373Z self.run() 2023-01-11T22:54:21.3386644Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3386775Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3387124Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3387259Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3387670Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3387800Z getattr(self, test_name)() 2023-01-11T22:54:21.3388165Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3388265Z fn() 2023-01-11T22:54:21.3388629Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3388737Z test(self, **param_kwargs) 2023-01-11T22:54:21.3389093Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3389224Z return func(*args, **kwargs) 2023-01-11T22:54:21.3389473Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3389588Z self.run_subtests( 2023-01-11T22:54:21.3389943Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3390113Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3390471Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3390607Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3390979Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3391099Z output = model(*input) 2023-01-11T22:54:21.3391433Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3391572Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3391949Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3392124Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3392493Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3392599Z _lazy_init(state, module) 2023-01-11T22:54:21.3392952Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3393124Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3393522Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3393670Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3394008Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3394133Z return func(*args, **kwargs) 2023-01-11T22:54:21.3394510Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3394596Z p_assert( 2023-01-11T22:54:21.3394933Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3395063Z traceback.print_stack() 2023-01-11T22:54:21.3395462Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.3396213Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3396404Z File "", line 1, in 2023-01-11T22:54:21.3396617Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3396762Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3397009Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3397150Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3397366Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3397471Z self.run() 2023-01-11T22:54:21.3397675Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3397821Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3398171Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3398308Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3398654Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3398781Z getattr(self, test_name)() 2023-01-11T22:54:21.3399143Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3399244Z fn() 2023-01-11T22:54:21.3399608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3399732Z test(self, **param_kwargs) 2023-01-11T22:54:21.3400089Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3400214Z return func(*args, **kwargs) 2023-01-11T22:54:21.3400445Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3400562Z self.run_subtests( 2023-01-11T22:54:21.3400918Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3401081Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3401441Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3401598Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3401972Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3402093Z output = model(*input) 2023-01-11T22:54:21.3402403Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3402542Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3402921Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3403103Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3403471Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3403593Z _lazy_init(state, module) 2023-01-11T22:54:21.3403948Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3404116Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3404513Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3404639Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3404980Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3405192Z return func(*args, **kwargs) 2023-01-11T22:54:21.3405579Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3405681Z p_assert( 2023-01-11T22:54:21.3406018Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3406145Z traceback.print_stack() 2023-01-11T22:54:21.3406421Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:54:21.3406667Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:54:21.3407077Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.3407831Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3407967Z File "", line 1, in 2023-01-11T22:54:21.3408179Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3408324Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3408533Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3408686Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3408884Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3408989Z self.run() 2023-01-11T22:54:21.3409191Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3409339Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3409680Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3409817Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3410179Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3410303Z getattr(self, test_name)() 2023-01-11T22:54:21.3410646Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3410744Z fn() 2023-01-11T22:54:21.3411112Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3411236Z test(self, **param_kwargs) 2023-01-11T22:54:21.3411592Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3411718Z return func(*args, **kwargs) 2023-01-11T22:54:21.3411964Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3412082Z self.run_subtests( 2023-01-11T22:54:21.3412420Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3412587Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3413126Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3413291Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3413671Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3413791Z output = model(*input) 2023-01-11T22:54:21.3414118Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3414258Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3414617Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3414893Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3415271Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3415393Z _lazy_init(state, module) 2023-01-11T22:54:21.3415800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3415976Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3416383Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3416526Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3416847Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3416980Z return func(*args, **kwargs) 2023-01-11T22:54:21.3417363Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3417468Z p_assert( 2023-01-11T22:54:21.3417806Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3417934Z traceback.print_stack() 2023-01-11T22:54:21.3418336Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.3419088Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3419220Z File "", line 1, in 2023-01-11T22:54:21.3419419Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3419564Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3419770Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3419926Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3420141Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3420245Z self.run() 2023-01-11T22:54:21.3420453Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3420601Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3420926Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3421062Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3421425Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3421554Z getattr(self, test_name)() 2023-01-11T22:54:21.3421920Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3422019Z fn() 2023-01-11T22:54:21.3422385Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3422491Z test(self, **param_kwargs) 2023-01-11T22:54:21.3422851Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3422976Z return func(*args, **kwargs) 2023-01-11T22:54:21.3423226Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 226, in test_mixture_of_experts 2023-01-11T22:54:21.3423340Z self.run_subtests( 2023-01-11T22:54:21.3423695Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3423916Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3424285Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3424422Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3424795Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3424916Z output = model(*input) 2023-01-11T22:54:21.3425294Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3425442Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3425825Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3426002Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3426373Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3426499Z _lazy_init(state, module) 2023-01-11T22:54:21.3426836Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3427005Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3427404Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3427548Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3427889Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3428014Z return func(*args, **kwargs) 2023-01-11T22:54:21.3428390Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3428493Z p_assert( 2023-01-11T22:54:21.3428819Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3428947Z traceback.print_stack() 2023-01-11T22:54:21.3429196Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:54:21.3429434Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:54:21.3429837Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.3430588Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3430986Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.3431722Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3431967Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:54:21.3432201Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:54:21.3432574Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.3432971Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.3433216Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:54:21.3433515Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:54:21.3433917Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.3434313Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.3434600Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:54:21.3434842Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:54:21.3435237Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.3435629Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.3435856Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:54:21.3436247Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.3436492Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:54:21.3436889Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.3437127Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:54:21.3437515Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.3437756Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:54:21.3438153Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.3438390Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:54:21.3438764Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.3439007Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:54:21.3439400Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.3440152Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3440397Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:54:21.3440789Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.3441029Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:54:21.3441419Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.3442159Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3442894Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3443739Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3444490Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3445223Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3445962Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3446696Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3447427Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3448165Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3448895Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3449629Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3450359Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3451089Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3451814Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3452655Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3453543Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3454278Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3455014Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3455745Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3456471Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3457204Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3458234Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3459631Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3460956Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3461711Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3461941Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:54:21.3462300Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:54:21.3462855Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.3463403Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.3464234Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3464988Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3465234Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:54:21.3465470Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:54:21.3465869Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.3466263Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.3466534Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:54:21.3466752Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:54:21.3467149Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.3467546Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.3467784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:54:21.3468017Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:54:21.3468412Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.3468805Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.3469044Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 0 2023-01-11T22:54:21.3469276Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 1 2023-01-11T22:54:21.3469654Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:54:21.3470050Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:54:21.3470287Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 0 2023-01-11T22:54:21.3470521Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 1 2023-01-11T22:54:21.3470912Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:54:21.3471302Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:54:21.3471538Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 0 2023-01-11T22:54:21.3471835Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 1 2023-01-11T22:54:21.3472234Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:54:21.3472625Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:54:21.3472893Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 0 2023-01-11T22:54:21.3473132Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 1 2023-01-11T22:54:21.3473528Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:54:21.3473917Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:54:21.3474159Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 0 2023-01-11T22:54:21.3474390Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 1 2023-01-11T22:54:21.3474778Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:54:21.3475170Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:54:21.3475408Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 0 2023-01-11T22:54:21.3475622Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 1 2023-01-11T22:54:21.3476012Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:54:21.3476400Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:54:21.3476641Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 0 2023-01-11T22:54:21.3476872Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 1 2023-01-11T22:54:21.3477266Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:54:21.3477652Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:54:21.3477887Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 0 2023-01-11T22:54:21.3478118Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 1 2023-01-11T22:54:21.3478507Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:54:21.3478884Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:54:21.3479632Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3480375Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3481116Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3481981Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3482728Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3483462Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3484201Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3484938Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3485667Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3486405Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3487137Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3487865Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3488597Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3489324Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3490052Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3490888Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3491626Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3492353Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3493639Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3494546Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3495275Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3496006Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3496729Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3497455Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3498194Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3498917Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3499637Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3500524Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3501266Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3501990Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3502718Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3503448Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3503695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 0 2023-01-11T22:54:21.3503936Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 1 2023-01-11T22:54:21.3504337Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:54:21.3505069Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3505457Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:54:21.3506181Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3506422Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 0 2023-01-11T22:54:21.3506636Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 1 2023-01-11T22:54:21.3507024Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:54:21.3507751Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3508138Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:54:21.3508866Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3509159Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 0 2023-01-11T22:54:21.3509430Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 1 2023-01-11T22:54:21.3509840Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:54:21.3510573Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3510965Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:54:21.3511692Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3511929Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 0 2023-01-11T22:54:21.3512159Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 1 2023-01-11T22:54:21.3512529Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:54:21.3513255Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3513641Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:54:21.3514360Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3514475Z dist init r=0, world=2 2023-01-11T22:54:21.3514805Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3515121Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3515431Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3515738Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3516041Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3516342Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3516641Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3516983Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3517283Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3517625Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3517932Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3518232Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.3518349Z dist init r=1, world=2 2023-01-11T22:54:21.3518672Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3518987Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3519302Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3519608Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3519911Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3520214Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3520496Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3520799Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3521098Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3521399Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3521698Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3522000Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.3522102Z ok (6.818s) 2023-01-11T22:54:21.3522465Z test_mixture_of_experts_with_delay_before_free_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85941 2023-01-11T22:54:21.3522686Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85942 2023-01-11T22:54:21.3523073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.3523233Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.3523600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.3523844Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.3524229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.3524420Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.3524848Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.3525045Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.3525291Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.3525534Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.3525921Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.3526318Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.3526550Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.3526778Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.3527804Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.3527918Z warnings.warn( 2023-01-11T22:54:21.3528940Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.3529054Z warnings.warn( 2023-01-11T22:54:21.3529303Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:54:21.3529547Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:54:21.3529945Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.3530692Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3531076Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.3531819Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3532060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:54:21.3532305Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:54:21.3532698Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.3534100Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.3534349Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:54:21.3534830Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.3535076Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:54:21.3535472Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.3535694Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:54:21.3536080Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.3536317Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:54:21.3536710Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.3536949Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:54:21.3537188Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:54:21.3537571Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.3537961Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.3538199Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:54:21.3538586Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.3538808Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:54:21.3539195Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.3539435Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:54:21.3539830Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.3540066Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:54:21.3540455Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.3540695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:54:21.3540930Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:54:21.3541316Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.3541687Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.3542437Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3543181Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3544077Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3544835Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3545575Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3546320Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3547093Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3547834Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3548570Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3549299Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3550028Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3550764Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3551492Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3552218Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3553067Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3553810Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3554539Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3555274Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3555999Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3556728Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3557458Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3558184Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3558908Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3559642Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3560367Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3561094Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3561930Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3562671Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3563395Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3564126Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3564850Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3565576Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3566303Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3567054Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3567780Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3568505Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3569232Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3569961Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3570747Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3571523Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3571778Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:54:21.3572018Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:54:21.3572429Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.3572827Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.3574105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:54:21.3574336Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:54:21.3574738Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.3575133Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.3575374Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:54:21.3575612Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:54:21.3576001Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.3576393Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.3576633Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:54:21.3577023Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.3577247Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:54:21.3577641Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.3577883Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:54:21.3578273Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.3578514Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:54:21.3578908Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.3579147Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:54:21.3579380Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:54:21.3579768Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.3580258Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.3580479Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:54:21.3580866Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.3581161Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:54:21.3581565Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.3581801Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:54:21.3582033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:54:21.3582421Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.3582813Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.3583051Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:54:21.3583271Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:54:21.3583663Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.3584048Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.3584284Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:54:21.3584516Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:54:21.3584908Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.3585297Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.3585531Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:54:21.3585765Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:54:21.3586134Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.3586522Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.3586758Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:54:21.3587151Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.3587392Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:54:21.3587782Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.3588529Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3589269Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3590068Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3590847Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3591592Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3592323Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3593052Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3593780Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3594510Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3595241Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3595967Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3596699Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3597427Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3598153Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3598952Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3599722Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3600456Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3601187Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3601917Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3602644Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3603372Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3604099Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3604827Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3605556Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3606280Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3607003Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3607806Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3608568Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3609300Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3610029Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3610753Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3611479Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3612207Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3613714Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3614468Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3615201Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3615925Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3616652Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3617477Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3618255Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3618976Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3619709Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3620432Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3621155Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3621880Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3622607Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3623327Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3624055Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3624779Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3625510Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3626300Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3627070Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3627807Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3628536Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3629261Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3629987Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3630717Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3631442Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3631690Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:54:21.3632414Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3633148Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3633866Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3634590Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3634888Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:54:21.3635293Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.3635734Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.3636475Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3637207Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3637453Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:54:21.3637688Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:54:21.3638082Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.3638458Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.3639189Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3639434Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:54:21.3639669Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:54:21.3640064Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.3640457Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.3641192Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3641436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:54:21.3641670Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:54:21.3642065Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.3642463Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.3643204Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3643301Z dist init r=1, world=2 2023-01-11T22:54:21.3643471Z dist init r=0, world=2 2023-01-11T22:54:21.3643574Z ok (27.458s) 2023-01-11T22:54:21.3643929Z test_mixture_of_experts_with_delay_before_free_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86264 2023-01-11T22:54:21.3644150Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86265 2023-01-11T22:54:21.3644576Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.3644760Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.3645150Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.3645324Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.3645689Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.3645868Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.3646248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.3646438Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.3646683Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.3646931Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.3647329Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.3647725Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.3647937Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.3648168Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.3649190Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.3649309Z warnings.warn( 2023-01-11T22:54:21.3650323Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.3650439Z warnings.warn( 2023-01-11T22:54:21.3650686Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:54:21.3650928Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:54:21.3651328Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.3651859Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:54:21.3651971Z warnings.warn( 2023-01-11T22:54:21.3652712Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3654018Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.3654638Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:54:21.3654762Z warnings.warn( 2023-01-11T22:54:21.3655500Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3655746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:54:21.3655993Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:54:21.3656389Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.3656781Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.3657027Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:54:21.3657270Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:54:21.3657658Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.3658027Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.3658272Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:54:21.3658510Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:54:21.3658895Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.3659281Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.3659516Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:54:21.3659755Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:54:21.3660150Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.3660548Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.3660772Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:54:21.3661008Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:54:21.3661397Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.3661781Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.3662014Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:54:21.3662247Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:54:21.3662631Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.3663100Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.3663334Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:54:21.3663549Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:54:21.3663979Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.3664376Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.3665125Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3665873Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3666617Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3679276Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3680031Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3680770Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3681507Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3682240Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3682976Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3683709Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3684527Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3685291Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3686038Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3686770Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3687502Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3688227Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3689063Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3690129Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3690870Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3691600Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3692323Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3693951Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3694811Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3695594Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3695850Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:54:21.3696089Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:54:21.3696504Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.3696900Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.3697142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:54:21.3697363Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:54:21.3697759Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.3698153Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.3698395Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:54:21.3698633Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:54:21.3699025Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.3699414Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.3699655Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:54:21.3699886Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:54:21.3700260Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.3700651Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.3700891Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:54:21.3701124Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:54:21.3701514Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.3701901Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.3702139Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:54:21.3702370Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:54:21.3702759Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.3703146Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.3703420Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:54:21.3703650Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:54:21.3704050Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.3704494Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.3704734Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:54:21.3704966Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:54:21.3705362Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.3705753Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.3705988Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:54:21.3706200Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:54:21.3706590Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.3706974Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.3707208Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:54:21.3707439Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:54:21.3707831Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.3708217Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.3708451Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:54:21.3708683Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:54:21.3709055Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.3709434Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.3709669Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:54:21.3709905Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:54:21.3710295Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.3710678Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.3711706Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:1341: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.3711913Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2023-01-11T22:54:21.3712979Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:1341: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.3713219Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2023-01-11T22:54:21.3713466Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:54:21.3713701Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:54:21.3714102Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.3714474Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.3715220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3715470Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:54:21.3715706Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:54:21.3716097Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.3716488Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.3716733Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:54:21.3716965Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:54:21.3717355Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.3717745Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.3717966Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:54:21.3718200Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:54:21.3718588Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.3718979Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.3719095Z dist init r=1, world=2 2023-01-11T22:54:21.3719205Z dist init r=0, world=2 2023-01-11T22:54:21.3719308Z ok (28.862s) 2023-01-11T22:54:21.3719677Z test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86587 2023-01-11T22:54:21.3719885Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86588 2023-01-11T22:54:21.3720258Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.3720435Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.3720815Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.3721006Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.3721441Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.3721617Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.3721994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.3722224Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.3722458Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.3722701Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.3723101Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.3723497Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.3723731Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.3723958Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.3724984Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.3725100Z warnings.warn( 2023-01-11T22:54:21.3725341Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:54:21.3726340Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.3726454Z warnings.warn( 2023-01-11T22:54:21.3726678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:54:21.3727074Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.3727608Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:54:21.3727723Z warnings.warn( 2023-01-11T22:54:21.3728470Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3728867Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.3729402Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:54:21.3729511Z warnings.warn( 2023-01-11T22:54:21.3730255Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3730553Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:54:21.3730957Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.3731222Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:54:21.3731630Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.3731872Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:54:21.3732110Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:54:21.3732495Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.3733700Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.3734126Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:54:21.3734369Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:54:21.3734767Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.3735160Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.3735382Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:54:21.3735623Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:54:21.3736018Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.3736400Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.3736639Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:54:21.3736879Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:54:21.3737259Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.3737646Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.3737882Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:54:21.3738102Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:54:21.3738493Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.3738888Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.3739129Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:54:21.3739368Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:54:21.3739755Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.3740143Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.3740892Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3741807Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3742564Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3743299Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3744044Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3744774Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3745503Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3746243Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3746977Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3747702Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3748438Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3749163Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3749894Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3750726Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3751469Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3752191Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3752925Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3753650Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3754374Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3755101Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3755827Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3756551Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3757282Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3758009Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3758239Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:54:21.3758534Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:54:21.3758943Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.3759342Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.3759625Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:54:21.3759866Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:54:21.3760266Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.3760659Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.3760901Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:54:21.3761115Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:54:21.3761508Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.3761900Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.3762140Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:54:21.3762373Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:54:21.3762762Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.3763157Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.3763394Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:54:21.3763627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:54:21.3764018Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.3764387Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.3764623Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:54:21.3764854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:54:21.3765248Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.3765637Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.3765873Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:54:21.3766107Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:54:21.3766499Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.3766883Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.3767100Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:54:21.3767330Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:54:21.3767789Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.3768204Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.3768440Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:54:21.3768724Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:54:21.3769126Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.3769510Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.3769746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:54:21.3769981Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:54:21.3770353Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.3770737Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.3770978Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:54:21.3771207Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:54:21.3771596Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.3771981Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.3772219Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:54:21.3772609Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.3772848Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:54:21.3774002Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.3775036Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:1341: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.3775247Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2023-01-11T22:54:21.3776249Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:1341: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.3776447Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2023-01-11T22:54:21.3776689Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:54:21.3776922Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:54:21.3777319Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.3777811Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.3778613Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3778866Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:54:21.3779099Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:54:21.3779498Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.3779881Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.3780122Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:54:21.3780355Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:54:21.3780753Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.3781142Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.3781379Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:54:21.3781611Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:54:21.3781999Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.3782393Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.3782489Z dist init r=0, world=2 2023-01-11T22:54:21.3782600Z dist init r=1, world=2 2023-01-11T22:54:21.3782700Z ok (28.861s) 2023-01-11T22:54:21.3783063Z test_mixture_of_experts_with_delay_before_free_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86910 2023-01-11T22:54:21.3783285Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86911 2023-01-11T22:54:21.3783658Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.3783833Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.3784213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.3784391Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.3784758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.3784932Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.3785311Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.3785500Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.3785743Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.3785984Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.3786379Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.3786840Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.3787052Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.3787280Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.3788345Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.3788466Z warnings.warn( 2023-01-11T22:54:21.3788715Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:54:21.3789723Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.3789835Z warnings.warn( 2023-01-11T22:54:21.3790071Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:54:21.3790462Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.3791209Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3791606Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.3791737Z File "", line 1, in 2023-01-11T22:54:21.3791937Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3792082Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3792287Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3792439Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3792654Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3792759Z self.run() 2023-01-11T22:54:21.3792966Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3793112Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3793440Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3793574Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3793936Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3794063Z getattr(self, test_name)() 2023-01-11T22:54:21.3794425Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3794524Z fn() 2023-01-11T22:54:21.3794891Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3795015Z test(self, **param_kwargs) 2023-01-11T22:54:21.3795357Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3795543Z return func(*args, **kwargs) 2023-01-11T22:54:21.3795823Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3795938Z self.run_subtests( 2023-01-11T22:54:21.3796302Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3796511Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3796888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3797040Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3797402Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3797521Z output = model(*input) 2023-01-11T22:54:21.3797852Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3797991Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3798368Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3798543Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3798915Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3799037Z _lazy_init(state, module) 2023-01-11T22:54:21.3799373Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3799542Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3799939Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3800087Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3800426Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3800551Z return func(*args, **kwargs) 2023-01-11T22:54:21.3800931Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3801033Z p_assert( 2023-01-11T22:54:21.3801357Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3801483Z traceback.print_stack() 2023-01-11T22:54:21.3802235Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3802369Z File "", line 1, in 2023-01-11T22:54:21.3802581Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3802727Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3802931Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3803082Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3803283Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3803389Z self.run() 2023-01-11T22:54:21.3803592Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3803741Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3804082Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3804215Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3804662Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3804787Z getattr(self, test_name)() 2023-01-11T22:54:21.3805132Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3805230Z fn() 2023-01-11T22:54:21.3805644Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3805774Z test(self, **param_kwargs) 2023-01-11T22:54:21.3806136Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3806262Z return func(*args, **kwargs) 2023-01-11T22:54:21.3806542Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3806657Z self.run_subtests( 2023-01-11T22:54:21.3807000Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3807163Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3807527Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3807681Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3808054Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3808173Z output = model(*input) 2023-01-11T22:54:21.3808499Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3808638Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3808997Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3809176Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3809544Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3809666Z _lazy_init(state, module) 2023-01-11T22:54:21.3810020Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3810192Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3810590Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3810734Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3811054Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3811177Z return func(*args, **kwargs) 2023-01-11T22:54:21.3811551Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3811660Z p_assert( 2023-01-11T22:54:21.3811997Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3812124Z traceback.print_stack() 2023-01-11T22:54:21.3812371Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:54:21.3812618Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:54:21.3813690Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.3814207Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.3814961Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3815866Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3816007Z File "", line 1, in 2023-01-11T22:54:21.3816222Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3816366Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3816570Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3816720Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3816933Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3817025Z self.run() 2023-01-11T22:54:21.3817230Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3817376Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3817725Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3817857Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3818221Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3818346Z getattr(self, test_name)() 2023-01-11T22:54:21.3818704Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3818785Z fn() 2023-01-11T22:54:21.3819152Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3819281Z test(self, **param_kwargs) 2023-01-11T22:54:21.3819636Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3819761Z return func(*args, **kwargs) 2023-01-11T22:54:21.3820040Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3820153Z self.run_subtests( 2023-01-11T22:54:21.3820511Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3820658Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3821023Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3821176Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3821554Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3821680Z output = model(*input) 2023-01-11T22:54:21.3822008Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3822146Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3822523Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3822684Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3823051Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3823173Z _lazy_init(state, module) 2023-01-11T22:54:21.3823526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3823694Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3824159Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3824302Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3824639Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3824747Z return func(*args, **kwargs) 2023-01-11T22:54:21.3825170Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3825281Z p_assert( 2023-01-11T22:54:21.3825624Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3825750Z traceback.print_stack() 2023-01-11T22:54:21.3825881Z File "", line 1, in 2023-01-11T22:54:21.3826091Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3826239Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3826424Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3826574Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3826787Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3826891Z self.run() 2023-01-11T22:54:21.3827093Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3827244Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3827589Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3827706Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3828065Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3828188Z getattr(self, test_name)() 2023-01-11T22:54:21.3828545Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3828647Z fn() 2023-01-11T22:54:21.3829011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3829133Z test(self, **param_kwargs) 2023-01-11T22:54:21.3829494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3829604Z return func(*args, **kwargs) 2023-01-11T22:54:21.3829883Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3829995Z self.run_subtests( 2023-01-11T22:54:21.3830347Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3830512Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3830881Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3831034Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3831408Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3831511Z output = model(*input) 2023-01-11T22:54:21.3831844Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3831983Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3832362Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3832537Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3832905Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3833090Z _lazy_init(state, module) 2023-01-11T22:54:21.3833450Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3833617Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3833999Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3834188Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3834544Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3834670Z return func(*args, **kwargs) 2023-01-11T22:54:21.3835048Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3835151Z p_assert( 2023-01-11T22:54:21.3835485Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3835614Z traceback.print_stack() 2023-01-11T22:54:21.3835845Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:54:21.3836088Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:54:21.3836493Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.3837243Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3837376Z File "", line 1, in 2023-01-11T22:54:21.3837587Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3837734Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3837937Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3838086Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3838284Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3838386Z self.run() 2023-01-11T22:54:21.3838592Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3838740Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3839083Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3839217Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3839579Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3839685Z getattr(self, test_name)() 2023-01-11T22:54:21.3840050Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3840147Z fn() 2023-01-11T22:54:21.3840513Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3840635Z test(self, **param_kwargs) 2023-01-11T22:54:21.3840994Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3841119Z return func(*args, **kwargs) 2023-01-11T22:54:21.3841397Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3841495Z self.run_subtests( 2023-01-11T22:54:21.3841849Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3842011Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3842442Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3842597Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3842970Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3843090Z output = model(*input) 2023-01-11T22:54:21.3843463Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3843592Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3843975Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3844151Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3844518Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3844642Z _lazy_init(state, module) 2023-01-11T22:54:21.3844996Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3845164Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3845560Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3845706Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3846027Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3846156Z return func(*args, **kwargs) 2023-01-11T22:54:21.3846536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3846638Z p_assert( 2023-01-11T22:54:21.3847012Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3847143Z traceback.print_stack() 2023-01-11T22:54:21.3847544Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.3848294Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3848410Z File "", line 1, in 2023-01-11T22:54:21.3848621Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3848766Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3848970Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3849122Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3849337Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3849442Z self.run() 2023-01-11T22:54:21.3849645Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3849776Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3850119Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3850257Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3850621Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3850745Z getattr(self, test_name)() 2023-01-11T22:54:21.3851102Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3851201Z fn() 2023-01-11T22:54:21.3851566Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3851765Z test(self, **param_kwargs) 2023-01-11T22:54:21.3852132Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3852259Z return func(*args, **kwargs) 2023-01-11T22:54:21.3852541Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3852705Z self.run_subtests( 2023-01-11T22:54:21.3853778Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3853981Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3854358Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3854495Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3854877Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3855002Z output = model(*input) 2023-01-11T22:54:21.3855328Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3855469Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3855850Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3856025Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3856394Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3856499Z _lazy_init(state, module) 2023-01-11T22:54:21.3856851Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3857023Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3857424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3857567Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3857903Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3858028Z return func(*args, **kwargs) 2023-01-11T22:54:21.3858410Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3858495Z p_assert( 2023-01-11T22:54:21.3858834Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3858960Z traceback.print_stack() 2023-01-11T22:54:21.3859205Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:54:21.3859454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:54:21.3859856Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.3860608Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3860739Z File "", line 1, in 2023-01-11T22:54:21.3860951Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3861078Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3861283Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3861434Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3861750Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3861854Z self.run() 2023-01-11T22:54:21.3862058Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3862205Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3862555Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3862732Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3863115Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3863241Z getattr(self, test_name)() 2023-01-11T22:54:21.3863601Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3863699Z fn() 2023-01-11T22:54:21.3864063Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3864193Z test(self, **param_kwargs) 2023-01-11T22:54:21.3864534Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3864658Z return func(*args, **kwargs) 2023-01-11T22:54:21.3864938Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3865051Z self.run_subtests( 2023-01-11T22:54:21.3865405Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3865567Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3865930Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3866083Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3866464Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3866565Z output = model(*input) 2023-01-11T22:54:21.3866891Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3867029Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3867407Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3867582Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3867948Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3868070Z _lazy_init(state, module) 2023-01-11T22:54:21.3868424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3868606Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3869011Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3869157Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3869494Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3869622Z return func(*args, **kwargs) 2023-01-11T22:54:21.3870000Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3870102Z p_assert( 2023-01-11T22:54:21.3870436Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3870547Z traceback.print_stack() 2023-01-11T22:54:21.3870947Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.3871784Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3871917Z File "", line 1, in 2023-01-11T22:54:21.3872179Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3872329Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3872532Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3872684Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3872880Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3872985Z self.run() 2023-01-11T22:54:21.3873191Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3873344Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3873691Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3873825Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3874185Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3874313Z getattr(self, test_name)() 2023-01-11T22:54:21.3874655Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3874752Z fn() 2023-01-11T22:54:21.3875119Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3875243Z test(self, **param_kwargs) 2023-01-11T22:54:21.3875599Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3875729Z return func(*args, **kwargs) 2023-01-11T22:54:21.3876007Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3876121Z self.run_subtests( 2023-01-11T22:54:21.3876461Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3876628Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3876994Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3877148Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3877525Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3877645Z output = model(*input) 2023-01-11T22:54:21.3877979Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3878118Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3878477Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3878652Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3879023Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3879147Z _lazy_init(state, module) 2023-01-11T22:54:21.3879501Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3879669Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3880065Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3880270Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3880599Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3880725Z return func(*args, **kwargs) 2023-01-11T22:54:21.3881100Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3881205Z p_assert( 2023-01-11T22:54:21.3881592Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3881725Z traceback.print_stack() 2023-01-11T22:54:21.3881972Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:54:21.3882219Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:54:21.3882608Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.3883363Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3883495Z File "", line 1, in 2023-01-11T22:54:21.3883709Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3883854Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3884057Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3884208Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3884420Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3884525Z self.run() 2023-01-11T22:54:21.3884711Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3884862Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3885211Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3885347Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3885710Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3885836Z getattr(self, test_name)() 2023-01-11T22:54:21.3886198Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3886298Z fn() 2023-01-11T22:54:21.3886646Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3886771Z test(self, **param_kwargs) 2023-01-11T22:54:21.3887130Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3887260Z return func(*args, **kwargs) 2023-01-11T22:54:21.3887542Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3887657Z self.run_subtests( 2023-01-11T22:54:21.3888013Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3888179Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3888529Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3888684Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3889064Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3889186Z output = model(*input) 2023-01-11T22:54:21.3889580Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3889720Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3890098Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3890273Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3890675Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3890805Z _lazy_init(state, module) 2023-01-11T22:54:21.3891169Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3891338Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3891735Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3891883Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3892222Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3892349Z return func(*args, **kwargs) 2023-01-11T22:54:21.3892710Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3892813Z p_assert( 2023-01-11T22:54:21.3893679Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3893810Z traceback.print_stack() 2023-01-11T22:54:21.3894213Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.3894957Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3895092Z File "", line 1, in 2023-01-11T22:54:21.3895306Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3895451Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3895640Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3895791Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3896007Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3896113Z self.run() 2023-01-11T22:54:21.3896318Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3896465Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3896808Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3896928Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3897289Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3897414Z getattr(self, test_name)() 2023-01-11T22:54:21.3897773Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3897872Z fn() 2023-01-11T22:54:21.3898239Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3898364Z test(self, **param_kwargs) 2023-01-11T22:54:21.3898725Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3898834Z return func(*args, **kwargs) 2023-01-11T22:54:21.3899112Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3899319Z self.run_subtests( 2023-01-11T22:54:21.3899686Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3899848Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3900208Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3900424Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3900817Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3900920Z output = model(*input) 2023-01-11T22:54:21.3901246Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3901384Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3901758Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3901937Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3902307Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3902428Z _lazy_init(state, module) 2023-01-11T22:54:21.3902786Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3902956Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3903341Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3903486Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3903823Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3903955Z return func(*args, **kwargs) 2023-01-11T22:54:21.3904331Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3904433Z p_assert( 2023-01-11T22:54:21.3904769Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3904895Z traceback.print_stack() 2023-01-11T22:54:21.3905126Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:54:21.3905370Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:54:21.3905769Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.3906518Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3906654Z File "", line 1, in 2023-01-11T22:54:21.3906866Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3907009Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3907214Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3907367Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3907563Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3907667Z self.run() 2023-01-11T22:54:21.3907870Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3908017Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3908362Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3908566Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3908935Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3909043Z getattr(self, test_name)() 2023-01-11T22:54:21.3909406Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3909552Z fn() 2023-01-11T22:54:21.3909931Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3910055Z test(self, **param_kwargs) 2023-01-11T22:54:21.3910409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3910535Z return func(*args, **kwargs) 2023-01-11T22:54:21.3910814Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3910915Z self.run_subtests( 2023-01-11T22:54:21.3911271Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3911434Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3911801Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3911956Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3912333Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3912453Z output = model(*input) 2023-01-11T22:54:21.3912781Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3912904Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3913287Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3913463Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3913835Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3913958Z _lazy_init(state, module) 2023-01-11T22:54:21.3914315Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3914485Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3914883Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3915027Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3915350Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3915479Z return func(*args, **kwargs) 2023-01-11T22:54:21.3915860Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3915963Z p_assert( 2023-01-11T22:54:21.3916302Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3916430Z traceback.print_stack() 2023-01-11T22:54:21.3916833Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.3917581Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3917697Z File "", line 1, in 2023-01-11T22:54:21.3917971Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3918116Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3918319Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3918470Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3918682Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3918827Z self.run() 2023-01-11T22:54:21.3919034Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3919163Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3919515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3919652Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3920015Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3920143Z getattr(self, test_name)() 2023-01-11T22:54:21.3920506Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3920605Z fn() 2023-01-11T22:54:21.3920973Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3921079Z test(self, **param_kwargs) 2023-01-11T22:54:21.3921441Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3921570Z return func(*args, **kwargs) 2023-01-11T22:54:21.3921847Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3921961Z self.run_subtests( 2023-01-11T22:54:21.3922314Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3922480Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3922843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3922980Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3923358Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3923481Z output = model(*input) 2023-01-11T22:54:21.3923807Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3923946Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3924326Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3924501Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3924874Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3924978Z _lazy_init(state, module) 2023-01-11T22:54:21.3925331Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3925500Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3925901Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3926047Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3926385Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3926511Z return func(*args, **kwargs) 2023-01-11T22:54:21.3926888Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3927037Z p_assert( 2023-01-11T22:54:21.3927382Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3927510Z traceback.print_stack() 2023-01-11T22:54:21.3927756Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:54:21.3928000Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:54:21.3928447Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.3929207Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3929341Z File "", line 1, in 2023-01-11T22:54:21.3929559Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3929687Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3929890Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3930044Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3930259Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3930366Z self.run() 2023-01-11T22:54:21.3930569Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3930716Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3931058Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3931174Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3931536Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3931665Z getattr(self, test_name)() 2023-01-11T22:54:21.3932028Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3932126Z fn() 2023-01-11T22:54:21.3932488Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3932615Z test(self, **param_kwargs) 2023-01-11T22:54:21.3933354Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3933485Z return func(*args, **kwargs) 2023-01-11T22:54:21.3933767Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3933883Z self.run_subtests( 2023-01-11T22:54:21.3934243Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3934410Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3934774Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3934928Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3935303Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3935407Z output = model(*input) 2023-01-11T22:54:21.3935733Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3935875Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3936251Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3936427Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3936909Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3937030Z _lazy_init(state, module) 2023-01-11T22:54:21.3937382Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3937534Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3937991Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3938148Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3938494Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3938621Z return func(*args, **kwargs) 2023-01-11T22:54:21.3938997Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3939104Z p_assert( 2023-01-11T22:54:21.3939441Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3939552Z traceback.print_stack() 2023-01-11T22:54:21.3939955Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.3940708Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3940841Z File "", line 1, in 2023-01-11T22:54:21.3941053Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3941198Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3941404Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3941558Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3941755Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3941859Z self.run() 2023-01-11T22:54:21.3942060Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3942207Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3942556Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3942690Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3943053Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3943177Z getattr(self, test_name)() 2023-01-11T22:54:21.3943519Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3943620Z fn() 2023-01-11T22:54:21.3943988Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3944112Z test(self, **param_kwargs) 2023-01-11T22:54:21.3944468Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3944592Z return func(*args, **kwargs) 2023-01-11T22:54:21.3944872Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3944986Z self.run_subtests( 2023-01-11T22:54:21.3945321Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3945506Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3945878Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3946095Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3946480Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3946602Z output = model(*input) 2023-01-11T22:54:21.3946911Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3947097Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3947487Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3947665Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3948035Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3948159Z _lazy_init(state, module) 2023-01-11T22:54:21.3948515Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3948684Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3949065Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3949210Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3949552Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3949680Z return func(*args, **kwargs) 2023-01-11T22:54:21.3950061Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3950164Z p_assert( 2023-01-11T22:54:21.3950498Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3950625Z traceback.print_stack() 2023-01-11T22:54:21.3950857Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:54:21.3951105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:54:21.3951507Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.3952262Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3952397Z File "", line 1, in 2023-01-11T22:54:21.3952611Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3952755Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3952965Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3953117Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3953313Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3953420Z self.run() 2023-01-11T22:54:21.3953624Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3953775Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3954122Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3954257Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3954620Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3954746Z getattr(self, test_name)() 2023-01-11T22:54:21.3955089Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3955251Z fn() 2023-01-11T22:54:21.3955626Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3955752Z test(self, **param_kwargs) 2023-01-11T22:54:21.3956110Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3956237Z return func(*args, **kwargs) 2023-01-11T22:54:21.3956575Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3956696Z self.run_subtests( 2023-01-11T22:54:21.3957040Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3957203Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3957569Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3957724Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3958103Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3958225Z output = model(*input) 2023-01-11T22:54:21.3958551Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3958695Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3959054Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3959232Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3959602Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3959725Z _lazy_init(state, module) 2023-01-11T22:54:21.3960081Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3960253Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3960654Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3960800Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3961123Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3961254Z return func(*args, **kwargs) 2023-01-11T22:54:21.3961636Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3961740Z p_assert( 2023-01-11T22:54:21.3962077Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3962208Z traceback.print_stack() 2023-01-11T22:54:21.3962608Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.3963362Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3963494Z File "", line 1, in 2023-01-11T22:54:21.3963690Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3963837Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3964041Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3964194Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3964410Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3964576Z self.run() 2023-01-11T22:54:21.3964782Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3964931Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3965263Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3965397Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3965813Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3965945Z getattr(self, test_name)() 2023-01-11T22:54:21.3966316Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3966416Z fn() 2023-01-11T22:54:21.3966781Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3966892Z test(self, **param_kwargs) 2023-01-11T22:54:21.3967251Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3967375Z return func(*args, **kwargs) 2023-01-11T22:54:21.3967656Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3967770Z self.run_subtests( 2023-01-11T22:54:21.3968129Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3968293Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3968659Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3968812Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3969198Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3969323Z output = model(*input) 2023-01-11T22:54:21.3969653Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3969793Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3970172Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3970353Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3970724Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3970846Z _lazy_init(state, module) 2023-01-11T22:54:21.3971181Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3971349Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3971754Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3971901Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3972239Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3972364Z return func(*args, **kwargs) 2023-01-11T22:54:21.3972743Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3972849Z p_assert( 2023-01-11T22:54:21.3973639Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3973766Z traceback.print_stack() 2023-01-11T22:54:21.3974014Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:54:21.3974255Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:54:21.3974749Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.3975553Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3976312Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3977052Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3977801Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3978535Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3978669Z File "", line 1, in 2023-01-11T22:54:21.3978884Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3979014Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3979220Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3979373Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3979589Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3979696Z self.run() 2023-01-11T22:54:21.3979903Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3980053Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3980396Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3980513Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3980883Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3981009Z getattr(self, test_name)() 2023-01-11T22:54:21.3981376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3981474Z fn() 2023-01-11T22:54:21.3981842Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3981967Z test(self, **param_kwargs) 2023-01-11T22:54:21.3982327Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3982435Z return func(*args, **kwargs) 2023-01-11T22:54:21.3982714Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3982827Z self.run_subtests( 2023-01-11T22:54:21.3983184Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3983350Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3983786Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3983941Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3984317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3984420Z output = model(*input) 2023-01-11T22:54:21.3984794Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3984942Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3985328Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3985505Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3985873Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.3986001Z _lazy_init(state, module) 2023-01-11T22:54:21.3986354Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.3986505Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.3986909Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.3987056Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.3987396Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.3987521Z return func(*args, **kwargs) 2023-01-11T22:54:21.3987906Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.3988011Z p_assert( 2023-01-11T22:54:21.3988354Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.3988466Z traceback.print_stack() 2023-01-11T22:54:21.3988868Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.3989621Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3990363Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3991103Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3991850Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3992584Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.3992716Z File "", line 1, in 2023-01-11T22:54:21.3992987Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.3993134Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.3993342Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.3993495Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.3993710Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.3993861Z self.run() 2023-01-11T22:54:21.3994073Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.3994222Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.3994573Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.3994712Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.3995080Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.3995210Z getattr(self, test_name)() 2023-01-11T22:54:21.3995555Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.3995658Z fn() 2023-01-11T22:54:21.3996024Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.3996154Z test(self, **param_kwargs) 2023-01-11T22:54:21.3996512Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.3996639Z return func(*args, **kwargs) 2023-01-11T22:54:21.3996918Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.3997033Z self.run_subtests( 2023-01-11T22:54:21.3997371Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.3997540Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.3997909Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.3998065Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.3998446Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.3998568Z output = model(*input) 2023-01-11T22:54:21.3998896Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.3999037Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.3999401Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.3999580Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.3999956Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4000081Z _lazy_init(state, module) 2023-01-11T22:54:21.4000437Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4000607Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4001014Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4001162Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4001501Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4001610Z return func(*args, **kwargs) 2023-01-11T22:54:21.4001991Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4002165Z p_assert( 2023-01-11T22:54:21.4002511Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4002641Z traceback.print_stack() 2023-01-11T22:54:21.4002888Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:54:21.4003130Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:54:21.4003570Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.4004312Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4004452Z File "", line 1, in 2023-01-11T22:54:21.4004663Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4004808Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4005013Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4005166Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4005389Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4005496Z self.run() 2023-01-11T22:54:21.4005684Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4005832Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4006179Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4006315Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4006680Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4006806Z getattr(self, test_name)() 2023-01-11T22:54:21.4007166Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4007267Z fn() 2023-01-11T22:54:21.4007614Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4007742Z test(self, **param_kwargs) 2023-01-11T22:54:21.4008102Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4008227Z return func(*args, **kwargs) 2023-01-11T22:54:21.4008507Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4008621Z self.run_subtests( 2023-01-11T22:54:21.4008977Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4009145Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4009496Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4009652Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4010031Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4010155Z output = model(*input) 2023-01-11T22:54:21.4010482Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4010622Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4010998Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4011176Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4011594Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4011719Z _lazy_init(state, module) 2023-01-11T22:54:21.4012072Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4012241Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4012683Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4012835Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4013387Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4013515Z return func(*args, **kwargs) 2023-01-11T22:54:21.4013879Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4013990Z p_assert( 2023-01-11T22:54:21.4014331Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4014460Z traceback.print_stack() 2023-01-11T22:54:21.4014858Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.4015612Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4015746Z File "", line 1, in 2023-01-11T22:54:21.4015961Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4016106Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4016295Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4016450Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4016664Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4016769Z self.run() 2023-01-11T22:54:21.4016973Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4017124Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4017472Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4017607Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4017958Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4018084Z getattr(self, test_name)() 2023-01-11T22:54:21.4018445Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4018548Z fn() 2023-01-11T22:54:21.4018916Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4019039Z test(self, **param_kwargs) 2023-01-11T22:54:21.4019396Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4019504Z return func(*args, **kwargs) 2023-01-11T22:54:21.4019786Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4019903Z self.run_subtests( 2023-01-11T22:54:21.4020260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4020423Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4020788Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4021028Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4021410Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4021531Z output = model(*input) 2023-01-11T22:54:21.4021843Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4022043Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4022439Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4022615Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4022985Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4023108Z _lazy_init(state, module) 2023-01-11T22:54:21.4023469Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4023640Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4024019Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4024165Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4024511Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4024643Z return func(*args, **kwargs) 2023-01-11T22:54:21.4025022Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4025127Z p_assert( 2023-01-11T22:54:21.4025469Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4025601Z traceback.print_stack() 2023-01-11T22:54:21.4025832Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:54:21.4026071Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:54:21.4026474Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.4027225Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4027361Z File "", line 1, in 2023-01-11T22:54:21.4027576Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4027721Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4027929Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4028081Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4028280Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4028387Z self.run() 2023-01-11T22:54:21.4028591Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4028739Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4029084Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4029220Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4029584Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4029691Z getattr(self, test_name)() 2023-01-11T22:54:21.4030057Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4030221Z fn() 2023-01-11T22:54:21.4030596Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4030720Z test(self, **param_kwargs) 2023-01-11T22:54:21.4031077Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4031250Z return func(*args, **kwargs) 2023-01-11T22:54:21.4031537Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4031635Z self.run_subtests( 2023-01-11T22:54:21.4031998Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4032161Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4032526Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4032686Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4033061Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4033184Z output = model(*input) 2023-01-11T22:54:21.4033516Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4033638Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4034015Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4034192Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4034561Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4034684Z _lazy_init(state, module) 2023-01-11T22:54:21.4035043Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4035213Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4035611Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4035756Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4036079Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4036208Z return func(*args, **kwargs) 2023-01-11T22:54:21.4036589Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4036693Z p_assert( 2023-01-11T22:54:21.4037033Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4037166Z traceback.print_stack() 2023-01-11T22:54:21.4037566Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.4038314Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4038432Z File "", line 1, in 2023-01-11T22:54:21.4038644Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4038789Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4038992Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4039147Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4039362Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4039532Z self.run() 2023-01-11T22:54:21.4039737Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4039867Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4040217Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4040354Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4040771Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4040901Z getattr(self, test_name)() 2023-01-11T22:54:21.4041269Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4041368Z fn() 2023-01-11T22:54:21.4041735Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4041846Z test(self, **param_kwargs) 2023-01-11T22:54:21.4042204Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4042332Z return func(*args, **kwargs) 2023-01-11T22:54:21.4042612Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4042728Z self.run_subtests( 2023-01-11T22:54:21.4043086Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4043252Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4043618Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4043754Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4044127Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4044253Z output = model(*input) 2023-01-11T22:54:21.4044581Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4044720Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4045099Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4045279Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4045648Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4045752Z _lazy_init(state, module) 2023-01-11T22:54:21.4046106Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4046281Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4046685Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4046829Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4047167Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4047294Z return func(*args, **kwargs) 2023-01-11T22:54:21.4047682Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4047770Z p_assert( 2023-01-11T22:54:21.4048110Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4048240Z traceback.print_stack() 2023-01-11T22:54:21.4048485Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:54:21.4048725Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:54:21.4049202Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.4049994Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4050133Z File "", line 1, in 2023-01-11T22:54:21.4050348Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4050474Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4050680Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4050833Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4051054Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4051160Z self.run() 2023-01-11T22:54:21.4051366Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4051515Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4051860Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4051981Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4052346Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4052470Z getattr(self, test_name)() 2023-01-11T22:54:21.4052835Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4053357Z fn() 2023-01-11T22:54:21.4053732Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4053863Z test(self, **param_kwargs) 2023-01-11T22:54:21.4054219Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4054326Z return func(*args, **kwargs) 2023-01-11T22:54:21.4054605Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4054725Z self.run_subtests( 2023-01-11T22:54:21.4055081Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4055245Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4055610Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4055765Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4056141Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4056249Z output = model(*input) 2023-01-11T22:54:21.4056582Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4056727Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4057108Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4057285Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4057653Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4057773Z _lazy_init(state, module) 2023-01-11T22:54:21.4058125Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4058276Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4058772Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4058920Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4059258Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4059382Z return func(*args, **kwargs) 2023-01-11T22:54:21.4059818Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4059932Z p_assert( 2023-01-11T22:54:21.4060278Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4060389Z traceback.print_stack() 2023-01-11T22:54:21.4060787Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.4061542Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4061677Z File "", line 1, in 2023-01-11T22:54:21.4061893Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4062037Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4062242Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4062394Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4062609Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4062698Z self.run() 2023-01-11T22:54:21.4062902Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4063056Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4063401Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4063538Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4063902Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4064025Z getattr(self, test_name)() 2023-01-11T22:54:21.4064376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4064478Z fn() 2023-01-11T22:54:21.4064846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4064969Z test(self, **param_kwargs) 2023-01-11T22:54:21.4065326Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4065458Z return func(*args, **kwargs) 2023-01-11T22:54:21.4065736Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4065852Z self.run_subtests( 2023-01-11T22:54:21.4066191Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4066362Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4066729Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4066881Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4067260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4067382Z output = model(*input) 2023-01-11T22:54:21.4067712Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4067928Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4068297Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4068473Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4068890Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4069019Z _lazy_init(state, module) 2023-01-11T22:54:21.4069380Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4069549Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4069974Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4070120Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4070462Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4070571Z return func(*args, **kwargs) 2023-01-11T22:54:21.4070951Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4071055Z p_assert( 2023-01-11T22:54:21.4071397Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4071525Z traceback.print_stack() 2023-01-11T22:54:21.4071771Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:54:21.4072012Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:54:21.4072411Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.4073149Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4073542Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.4074279Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4074520Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:54:21.4074754Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:54:21.4075151Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.4075547Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.4075787Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:54:21.4076035Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:54:21.4076435Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.4076830Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.4077048Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:54:21.4077342Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:54:21.4077740Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.4078133Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.4078423Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:54:21.4078663Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:54:21.4079061Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.4079451Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.4079697Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:54:21.4079914Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:54:21.4080303Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.4080696Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.4080934Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:54:21.4081166Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:54:21.4081560Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.4081956Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.4082708Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4082956Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:54:21.4083193Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:54:21.4083568Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.4083960Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.4084707Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4085449Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4086193Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4087006Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4087790Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4088537Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4089280Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4090017Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4090748Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4091482Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4092217Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4093361Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4094109Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4094837Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4095566Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4096381Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4097163Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4097906Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4098641Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4099368Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4100092Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4100820Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4101550Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4102273Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4103006Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4103733Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4104461Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4105249Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4106020Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4106756Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4107488Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4108217Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4108940Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4109666Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4110396Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4111123Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4111852Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4112580Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4112829Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:54:21.4113067Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:54:21.4113473Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.4113935Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.4114711Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4115452Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4115679Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:54:21.4115919Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:54:21.4116317Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.4116711Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.4117446Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4117690Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:54:21.4117927Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:54:21.4118323Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.4118717Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.4119458Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4119703Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:54:21.4119922Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:54:21.4120318Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.4120716Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.4121464Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4121709Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 1 2023-01-11T22:54:21.4121944Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 0 2023-01-11T22:54:21.4122340Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:54:21.4122732Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:54:21.4123543Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4123831Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 1 2023-01-11T22:54:21.4124073Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 0 2023-01-11T22:54:21.4124474Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:54:21.4124848Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:54:21.4125597Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4125839Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 1 2023-01-11T22:54:21.4126076Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 0 2023-01-11T22:54:21.4126469Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:54:21.4126864Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:54:21.4127604Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4127853Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 1 2023-01-11T22:54:21.4128089Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 0 2023-01-11T22:54:21.4128489Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:54:21.4128883Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:54:21.4129607Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4129852Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 1 2023-01-11T22:54:21.4130088Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 0 2023-01-11T22:54:21.4130477Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:54:21.4130874Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:54:21.4131617Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4131932Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 1 2023-01-11T22:54:21.4132167Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 0 2023-01-11T22:54:21.4132569Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:54:21.4133256Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:54:21.4134026Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4134267Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 1 2023-01-11T22:54:21.4134490Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 0 2023-01-11T22:54:21.4134882Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:54:21.4135276Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:54:21.4136015Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4136749Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4136992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 1 2023-01-11T22:54:21.4137226Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 0 2023-01-11T22:54:21.4137621Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:54:21.4138016Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:54:21.4138750Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4139480Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4140225Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4140965Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4141698Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4142580Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4143322Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4144055Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4144788Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4145517Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4146241Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4146982Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4147745Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4148477Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4149212Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4149945Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4150669Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4151500Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4152238Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4152969Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4153703Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4154432Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4155158Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4155875Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4156601Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4157326Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4158059Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4158786Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4159511Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4160351Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4161092Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4161823Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4162556Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4163282Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4164005Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4164739Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4165461Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4166191Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4166923Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4167653Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4168380Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4169222Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4169965Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4170718Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4171452Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4172181Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4172424Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 1 2023-01-11T22:54:21.4172669Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 0 2023-01-11T22:54:21.4173288Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:54:21.4173691Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:54:21.4174424Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4175155Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4175404Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 1 2023-01-11T22:54:21.4175644Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 0 2023-01-11T22:54:21.4176049Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:54:21.4176428Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:54:21.4177165Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4177998Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4178244Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 1 2023-01-11T22:54:21.4178699Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:54:21.4178949Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 0 2023-01-11T22:54:21.4179354Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:54:21.4180088Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4180831Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4181073Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 1 2023-01-11T22:54:21.4181308Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 0 2023-01-11T22:54:21.4181705Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:54:21.4182101Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:54:21.4182840Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4183556Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4183673Z dist init r=1, world=2 2023-01-11T22:54:21.4184007Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4184326Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4184639Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4184946Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4185252Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4185552Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4185853Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4186216Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4186565Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4186872Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4187157Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4187458Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4187574Z dist init r=0, world=2 2023-01-11T22:54:21.4187898Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4188214Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4188523Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4188827Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4189136Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4189444Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4189743Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4190043Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4190326Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4190626Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4190931Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4191228Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4191332Z ok (30.164s) 2023-01-11T22:54:21.4191689Z test_mixture_of_experts_with_delay_before_free_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87257 2023-01-11T22:54:21.4191913Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87258 2023-01-11T22:54:21.4192306Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.4192484Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.4192872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.4193124Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.4193502Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.4193681Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.4194110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.4194308Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.4194559Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.4194806Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.4195217Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.4195618Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.4195830Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.4196061Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.4197087Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.4197205Z warnings.warn( 2023-01-11T22:54:21.4198229Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.4198345Z warnings.warn( 2023-01-11T22:54:21.4198592Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:54:21.4198834Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:54:21.4199234Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.4199766Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:54:21.4199882Z warnings.warn( 2023-01-11T22:54:21.4200633Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4200751Z File "", line 1, in 2023-01-11T22:54:21.4200969Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4201115Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4201322Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4201475Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4201689Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4201856Z self.run() 2023-01-11T22:54:21.4202062Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4202193Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4202544Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4202679Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4203093Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4203224Z getattr(self, test_name)() 2023-01-11T22:54:21.4203593Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4203693Z fn() 2023-01-11T22:54:21.4204041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4204174Z test(self, **param_kwargs) 2023-01-11T22:54:21.4204537Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4204665Z return func(*args, **kwargs) 2023-01-11T22:54:21.4204947Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4205063Z self.run_subtests( 2023-01-11T22:54:21.4205423Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4205589Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4205955Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4206092Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4206468Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4206593Z output = model(*input) 2023-01-11T22:54:21.4206920Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4207058Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4207437Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4207617Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4207983Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4208088Z _lazy_init(state, module) 2023-01-11T22:54:21.4208445Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4208615Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4209021Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4209166Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4209507Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4209636Z return func(*args, **kwargs) 2023-01-11T22:54:21.4210019Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4210104Z p_assert( 2023-01-11T22:54:21.4210444Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4210572Z traceback.print_stack() 2023-01-11T22:54:21.4210973Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.4211503Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2023-01-11T22:54:21.4211678Z warnings.warn( 2023-01-11T22:54:21.4212476Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4212615Z File "", line 1, in 2023-01-11T22:54:21.4212829Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4213185Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4213399Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4213551Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4213770Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4213878Z self.run() 2023-01-11T22:54:21.4214082Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4214231Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4214568Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4214707Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4215075Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4215200Z getattr(self, test_name)() 2023-01-11T22:54:21.4215561Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4215661Z fn() 2023-01-11T22:54:21.4216028Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4216158Z test(self, **param_kwargs) 2023-01-11T22:54:21.4216499Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4216629Z return func(*args, **kwargs) 2023-01-11T22:54:21.4216908Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4217028Z self.run_subtests( 2023-01-11T22:54:21.4217385Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4217552Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4217915Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4218070Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4218432Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4218560Z output = model(*input) 2023-01-11T22:54:21.4218890Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4219031Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4219412Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4219589Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4219956Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4220081Z _lazy_init(state, module) 2023-01-11T22:54:21.4220416Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4220585Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4221086Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4221232Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4221573Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4221701Z return func(*args, **kwargs) 2023-01-11T22:54:21.4222138Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4222251Z p_assert( 2023-01-11T22:54:21.4222597Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4222709Z traceback.print_stack() 2023-01-11T22:54:21.4222954Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:54:21.4223197Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:54:21.4223605Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.4224002Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.4224756Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4225501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4225638Z File "", line 1, in 2023-01-11T22:54:21.4225854Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4225996Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4226185Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4226339Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4226557Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4226661Z self.run() 2023-01-11T22:54:21.4226864Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4227014Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4227362Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4227479Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4227848Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4227974Z getattr(self, test_name)() 2023-01-11T22:54:21.4228337Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4228436Z fn() 2023-01-11T22:54:21.4228806Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4228933Z test(self, **param_kwargs) 2023-01-11T22:54:21.4229292Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4229401Z return func(*args, **kwargs) 2023-01-11T22:54:21.4229532Z File "", line 1, in 2023-01-11T22:54:21.4229811Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4229990Z self.run_subtests( 2023-01-11T22:54:21.4230353Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4230516Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4230729Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4230874Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4231264Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4231428Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4231633Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4231786Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4232173Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4232300Z output = model(*input) 2023-01-11T22:54:21.4232514Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4232626Z self.run() 2023-01-11T22:54:21.4232937Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4233079Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4233287Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4233436Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4233813Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4233988Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4234329Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4234466Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4234818Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4234943Z _lazy_init(state, module) 2023-01-11T22:54:21.4235307Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4235433Z getattr(self, test_name)() 2023-01-11T22:54:21.4235794Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4235965Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4236324Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4236426Z fn() 2023-01-11T22:54:21.4236809Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4236957Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4237329Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4237457Z test(self, **param_kwargs) 2023-01-11T22:54:21.4237797Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4237929Z return func(*args, **kwargs) 2023-01-11T22:54:21.4238291Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4238415Z return func(*args, **kwargs) 2023-01-11T22:54:21.4238777Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4238882Z p_assert( 2023-01-11T22:54:21.4239163Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4239342Z self.run_subtests( 2023-01-11T22:54:21.4239689Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4239816Z traceback.print_stack() 2023-01-11T22:54:21.4240168Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4240377Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4240740Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4240896Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4241275Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4241397Z output = model(*input) 2023-01-11T22:54:21.4241728Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4241873Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4242252Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4242428Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4242780Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4242905Z _lazy_init(state, module) 2023-01-11T22:54:21.4243262Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4243430Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4243830Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4243980Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4244317Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4244444Z return func(*args, **kwargs) 2023-01-11T22:54:21.4244804Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4244965Z p_assert( 2023-01-11T22:54:21.4245310Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4245439Z traceback.print_stack() 2023-01-11T22:54:21.4245686Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:54:21.4245932Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:54:21.4246332Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.4246731Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.4247483Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4248228Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4248344Z File "", line 1, in 2023-01-11T22:54:21.4248559Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4248769Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4248974Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4249126Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4249344Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4249450Z self.run() 2023-01-11T22:54:21.4249695Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4249831Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4250187Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4250323Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4250687Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4250813Z getattr(self, test_name)() 2023-01-11T22:54:21.4251181Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4251284Z fn() 2023-01-11T22:54:21.4251634Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4251760Z test(self, **param_kwargs) 2023-01-11T22:54:21.4252121Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4252245Z return func(*args, **kwargs) 2023-01-11T22:54:21.4252526Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4252642Z self.run_subtests( 2023-01-11T22:54:21.4253209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4253379Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4253755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4253891Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4254267Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4254389Z output = model(*input) 2023-01-11T22:54:21.4254723Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4254863Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4255241Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4255418Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4255791Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4255901Z _lazy_init(state, module) 2023-01-11T22:54:21.4256255Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4256424Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4256824Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4256974Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4257314Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4257440Z return func(*args, **kwargs) 2023-01-11T22:54:21.4257818Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4257905Z p_assert( 2023-01-11T22:54:21.4258248Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4258476Z traceback.print_stack() 2023-01-11T22:54:21.4258608Z File "", line 1, in 2023-01-11T22:54:21.4258819Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4258965Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4259169Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4259358Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4259581Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4259685Z self.run() 2023-01-11T22:54:21.4259890Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4260039Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4260392Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4260532Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4260899Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4261010Z getattr(self, test_name)() 2023-01-11T22:54:21.4261368Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4261470Z fn() 2023-01-11T22:54:21.4261840Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4261963Z test(self, **param_kwargs) 2023-01-11T22:54:21.4262322Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4262446Z return func(*args, **kwargs) 2023-01-11T22:54:21.4262723Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4262824Z self.run_subtests( 2023-01-11T22:54:21.4263177Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4263342Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4263712Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4263869Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4264245Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4264370Z output = model(*input) 2023-01-11T22:54:21.4264699Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4264821Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4265198Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4265377Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4265743Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4265865Z _lazy_init(state, module) 2023-01-11T22:54:21.4266226Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4266397Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4266800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4266927Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4267267Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4267454Z return func(*args, **kwargs) 2023-01-11T22:54:21.4267843Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4267947Z p_assert( 2023-01-11T22:54:21.4268283Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4268411Z traceback.print_stack() 2023-01-11T22:54:21.4268706Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:54:21.4268942Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:54:21.4269353Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.4269749Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.4270501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4271279Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4271414Z File "", line 1, in 2023-01-11T22:54:21.4271628Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4271775Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4271982Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4272139Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4272334Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4272441Z self.run() 2023-01-11T22:54:21.4272647Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4272797Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4273149Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4273285Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4273653Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4273779Z getattr(self, test_name)() 2023-01-11T22:54:21.4274123Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4274224Z fn() 2023-01-11T22:54:21.4274592Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4274719Z test(self, **param_kwargs) 2023-01-11T22:54:21.4275080Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4275207Z return func(*args, **kwargs) 2023-01-11T22:54:21.4275487Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4275602Z self.run_subtests( 2023-01-11T22:54:21.4275941Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4276105Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4276468Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4276622Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4277070Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4277193Z output = model(*input) 2023-01-11T22:54:21.4277522Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4277661Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4278097Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4278282Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4278657Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4278778Z _lazy_init(state, module) 2023-01-11T22:54:21.4279136Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4279310Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4279710Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4279856Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4279969Z File "", line 1, in 2023-01-11T22:54:21.4280313Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4280440Z return func(*args, **kwargs) 2023-01-11T22:54:21.4280819Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4280925Z p_assert( 2023-01-11T22:54:21.4281137Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4281278Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4290270Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4290455Z traceback.print_stack() 2023-01-11T22:54:21.4290677Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4290817Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4291037Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4291146Z self.run() 2023-01-11T22:54:21.4291363Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4291515Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4291905Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4292042Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4292396Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4292528Z getattr(self, test_name)() 2023-01-11T22:54:21.4293174Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4293291Z fn() 2023-01-11T22:54:21.4293676Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4293805Z test(self, **param_kwargs) 2023-01-11T22:54:21.4294171Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4294300Z return func(*args, **kwargs) 2023-01-11T22:54:21.4294563Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4294681Z self.run_subtests( 2023-01-11T22:54:21.4295041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4295362Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4295741Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4295896Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4296273Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4296460Z output = model(*input) 2023-01-11T22:54:21.4296789Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4296932Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4297315Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4297494Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4297863Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4297990Z _lazy_init(state, module) 2023-01-11T22:54:21.4298346Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4298516Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4298920Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4299048Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4299385Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4299514Z return func(*args, **kwargs) 2023-01-11T22:54:21.4299894Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4300000Z p_assert( 2023-01-11T22:54:21.4300349Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4300477Z traceback.print_stack() 2023-01-11T22:54:21.4300726Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:54:21.4300954Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:54:21.4301362Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.4301762Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.4302522Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4303280Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4303416Z File "", line 1, in 2023-01-11T22:54:21.4303635Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4303781Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4303989Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4304146Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4304342Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4304447Z self.run() 2023-01-11T22:54:21.4304718Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4304868Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4305000Z File "", line 1, in 2023-01-11T22:54:21.4305355Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4305491Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4305885Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4306018Z getattr(self, test_name)() 2023-01-11T22:54:21.4306234Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4306377Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4306747Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4306849Z fn() 2023-01-11T22:54:21.4307061Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4307215Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4307564Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4307690Z test(self, **param_kwargs) 2023-01-11T22:54:21.4307905Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4308017Z self.run() 2023-01-11T22:54:21.4308378Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4308507Z return func(*args, **kwargs) 2023-01-11T22:54:21.4308711Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4308840Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4309122Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4309243Z self.run_subtests( 2023-01-11T22:54:21.4309588Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4309722Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4310076Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4310243Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4310607Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4310732Z getattr(self, test_name)() 2023-01-11T22:54:21.4311080Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4311235Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4311604Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4311705Z fn() 2023-01-11T22:54:21.4312083Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4312207Z output = model(*input) 2023-01-11T22:54:21.4312579Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4312707Z test(self, **param_kwargs) 2023-01-11T22:54:21.4313016Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4313159Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4313520Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4313648Z return func(*args, **kwargs) 2023-01-11T22:54:21.4314025Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4314265Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4314549Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4314665Z self.run_subtests( 2023-01-11T22:54:21.4315070Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4315203Z _lazy_init(state, module) 2023-01-11T22:54:21.4315565Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4315730Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4316084Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4316260Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4316625Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4316779Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4317160Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4317310Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4317690Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4317811Z output = model(*input) 2023-01-11T22:54:21.4318153Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4318280Z return func(*args, **kwargs) 2023-01-11T22:54:21.4318607Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4318753Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4319113Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4319218Z p_assert( 2023-01-11T22:54:21.4319598Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4319780Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4320119Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4320251Z traceback.print_stack() 2023-01-11T22:54:21.4320621Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4320745Z _lazy_init(state, module) 2023-01-11T22:54:21.4321081Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4321254Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4321657Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4321802Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4322144Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4322272Z return func(*args, **kwargs) 2023-01-11T22:54:21.4322649Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4322755Z p_assert( 2023-01-11T22:54:21.4323076Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4323205Z traceback.print_stack() 2023-01-11T22:54:21.4323514Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:54:21.4323764Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:54:21.4324172Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.4324617Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.4325385Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4326133Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4326271Z File "", line 1, in 2023-01-11T22:54:21.4326487Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4326614Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4326824Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4326980Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4327198Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4327305Z self.run() 2023-01-11T22:54:21.4327512Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4327661Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4328010Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4328132Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4328496Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4328623Z getattr(self, test_name)() 2023-01-11T22:54:21.4328987Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4329094Z fn() 2023-01-11T22:54:21.4329462Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4329588Z test(self, **param_kwargs) 2023-01-11T22:54:21.4329927Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4330055Z return func(*args, **kwargs) 2023-01-11T22:54:21.4330334Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4330453Z self.run_subtests( 2023-01-11T22:54:21.4330811Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4330976Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4331349Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4331505Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4331882Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4331985Z output = model(*input) 2023-01-11T22:54:21.4332316Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4332458Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4333085Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4333272Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4333651Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4333778Z _lazy_init(state, module) 2023-01-11T22:54:21.4334214Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4334375Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4334784Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4334932Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4335272Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4335405Z return func(*args, **kwargs) 2023-01-11T22:54:21.4335537Z File "", line 1, in 2023-01-11T22:54:21.4335921Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4336028Z p_assert( 2023-01-11T22:54:21.4336345Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4336477Z traceback.print_stack() 2023-01-11T22:54:21.4336691Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4336837Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4337043Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4337198Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4337415Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4337506Z self.run() 2023-01-11T22:54:21.4337712Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4337862Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4338209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4338345Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4338713Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4338839Z getattr(self, test_name)() 2023-01-11T22:54:21.4339201Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4339283Z fn() 2023-01-11T22:54:21.4339653Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4339779Z test(self, **param_kwargs) 2023-01-11T22:54:21.4340143Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4340270Z return func(*args, **kwargs) 2023-01-11T22:54:21.4340551Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4340669Z self.run_subtests( 2023-01-11T22:54:21.4341032Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4341178Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4341545Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4341701Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4342080Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4342284Z output = model(*input) 2023-01-11T22:54:21.4342622Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4342764Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4343142Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4343354Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4343740Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4343865Z _lazy_init(state, module) 2023-01-11T22:54:21.4344226Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4344397Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4344797Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4344949Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4345291Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4345411Z return func(*args, **kwargs) 2023-01-11T22:54:21.4345799Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4345904Z p_assert( 2023-01-11T22:54:21.4346245Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4346375Z traceback.print_stack() 2023-01-11T22:54:21.4346624Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:54:21.4346871Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:54:21.4347279Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.4347678Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.4348416Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4349164Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4349304Z File "", line 1, in 2023-01-11T22:54:21.4349519Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4349666Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4349873Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4350026Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4350246Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4350354Z self.run() 2023-01-11T22:54:21.4350541Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4350691Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4351043Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4351179Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4351546Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4351734Z getattr(self, test_name)() 2023-01-11T22:54:21.4352109Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4352210Z fn() 2023-01-11T22:54:21.4352557Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4352730Z test(self, **param_kwargs) 2023-01-11T22:54:21.4353105Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4353233Z return func(*args, **kwargs) 2023-01-11T22:54:21.4353517Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4353636Z self.run_subtests( 2023-01-11T22:54:21.4353997Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4354166Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4354512Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4354668Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4355051Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4355173Z output = model(*input) 2023-01-11T22:54:21.4355502Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4355643Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4355773Z File "", line 1, in 2023-01-11T22:54:21.4356150Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4356313Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4356524Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4356670Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4357045Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4357168Z _lazy_init(state, module) 2023-01-11T22:54:21.4357375Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4357530Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4357884Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4358040Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4358255Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4358367Z self.run() 2023-01-11T22:54:21.4358771Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4358914Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4359122Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4359270Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4359612Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4359722Z return func(*args, **kwargs) 2023-01-11T22:54:21.4360060Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4360199Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4360585Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4360765Z p_assert( 2023-01-11T22:54:21.4361137Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4361263Z getattr(self, test_name)() 2023-01-11T22:54:21.4361582Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4361710Z traceback.print_stack() 2023-01-11T22:54:21.4362124Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4362231Z fn() 2023-01-11T22:54:21.4362607Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4362732Z test(self, **param_kwargs) 2023-01-11T22:54:21.4363091Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4363221Z return func(*args, **kwargs) 2023-01-11T22:54:21.4363488Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4363608Z self.run_subtests( 2023-01-11T22:54:21.4363966Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4364131Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4364502Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4364659Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4365038Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4365160Z output = model(*input) 2023-01-11T22:54:21.4365470Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4365614Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4365993Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4366170Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4366543Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4366670Z _lazy_init(state, module) 2023-01-11T22:54:21.4367029Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4367198Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4367597Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4367724Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4368070Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4368199Z return func(*args, **kwargs) 2023-01-11T22:54:21.4368581Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4368685Z p_assert( 2023-01-11T22:54:21.4369023Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4369156Z traceback.print_stack() 2023-01-11T22:54:21.4369407Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:54:21.4369634Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:54:21.4370036Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.4370436Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.4371287Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4371471Z File "", line 1, in 2023-01-11T22:54:21.4371691Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4371838Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4372045Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4372198Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4372393Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4372504Z self.run() 2023-01-11T22:54:21.4372711Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4373023Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4373390Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4373529Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4373903Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4374012Z getattr(self, test_name)() 2023-01-11T22:54:21.4374373Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4374473Z fn() 2023-01-11T22:54:21.4374842Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4374969Z test(self, **param_kwargs) 2023-01-11T22:54:21.4375334Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4375460Z return func(*args, **kwargs) 2023-01-11T22:54:21.4375741Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4375839Z self.run_subtests( 2023-01-11T22:54:21.4376199Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4376367Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4376735Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4376890Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4377267Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4377392Z output = model(*input) 2023-01-11T22:54:21.4377722Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4377844Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4378226Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4378403Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4378777Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4378901Z _lazy_init(state, module) 2023-01-11T22:54:21.4379257Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4379425Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4379830Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4380068Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4380399Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4380527Z return func(*args, **kwargs) 2023-01-11T22:54:21.4380908Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4381075Z p_assert( 2023-01-11T22:54:21.4381432Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4381562Z traceback.print_stack() 2023-01-11T22:54:21.4382317Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4382455Z File "", line 1, in 2023-01-11T22:54:21.4382651Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4382798Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4383003Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4383156Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4383378Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4383484Z self.run() 2023-01-11T22:54:21.4383690Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4383836Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4384164Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4384299Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4384671Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4384797Z getattr(self, test_name)() 2023-01-11T22:54:21.4385160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4385259Z fn() 2023-01-11T22:54:21.4385629Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4385756Z test(self, **param_kwargs) 2023-01-11T22:54:21.4386098Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4386224Z return func(*args, **kwargs) 2023-01-11T22:54:21.4386507Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4386624Z self.run_subtests( 2023-01-11T22:54:21.4386988Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4387152Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4387521Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4387676Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4388041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4388165Z output = model(*input) 2023-01-11T22:54:21.4388495Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4388637Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4389019Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4389260Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4389636Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4389761Z _lazy_init(state, module) 2023-01-11T22:54:21.4390097Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4390318Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4390737Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4390885Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4391225Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4391353Z return func(*args, **kwargs) 2023-01-11T22:54:21.4391734Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4391844Z p_assert( 2023-01-11T22:54:21.4392163Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4392293Z traceback.print_stack() 2023-01-11T22:54:21.4392544Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:54:21.4392789Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:54:21.4393194Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.4393592Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.4394342Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4395101Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4395850Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4396592Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4397333Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4398068Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4398803Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4399669Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4400418Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4401151Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4401291Z File "", line 1, in 2023-01-11T22:54:21.4401491Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4401638Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4401847Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4402002Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4402218Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4402326Z self.run() 2023-01-11T22:54:21.4402531Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4402680Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4403015Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4403153Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4403523Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4403650Z getattr(self, test_name)() 2023-01-11T22:54:21.4404013Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4404116Z fn() 2023-01-11T22:54:21.4404486Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4404612Z test(self, **param_kwargs) 2023-01-11T22:54:21.4404957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4405084Z return func(*args, **kwargs) 2023-01-11T22:54:21.4405364Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4405484Z self.run_subtests( 2023-01-11T22:54:21.4405843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4406007Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4406378Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4406535Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4406894Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4407017Z output = model(*input) 2023-01-11T22:54:21.4407345Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4407486Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4407943Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4408122Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4408495Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4408619Z _lazy_init(state, module) 2023-01-11T22:54:21.4409006Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4409188Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4409599Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4409746Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4410086Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4410221Z return func(*args, **kwargs) 2023-01-11T22:54:21.4410599Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4410703Z p_assert( 2023-01-11T22:54:21.4411023Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4411153Z traceback.print_stack() 2023-01-11T22:54:21.4411288Z File "", line 1, in 2023-01-11T22:54:21.4411499Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4411644Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4411849Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4412003Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4412219Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4412310Z self.run() 2023-01-11T22:54:21.4412515Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4412665Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4413212Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4413353Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4413729Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4413856Z getattr(self, test_name)() 2023-01-11T22:54:21.4414199Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4414301Z fn() 2023-01-11T22:54:21.4414672Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4414802Z test(self, **param_kwargs) 2023-01-11T22:54:21.4415162Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4415290Z return func(*args, **kwargs) 2023-01-11T22:54:21.4415571Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4415688Z self.run_subtests( 2023-01-11T22:54:21.4416029Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4416195Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4416565Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4416721Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4417098Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4417310Z output = model(*input) 2023-01-11T22:54:21.4417645Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4417787Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4418146Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4418389Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4418779Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4418904Z _lazy_init(state, module) 2023-01-11T22:54:21.4419258Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4419429Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4419834Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4419981Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4420320Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4420429Z return func(*args, **kwargs) 2023-01-11T22:54:21.4420812Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4420917Z p_assert( 2023-01-11T22:54:21.4421259Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4421387Z traceback.print_stack() 2023-01-11T22:54:21.4421636Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:54:21.4421876Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:54:21.4422281Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.4423019Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4423156Z File "", line 1, in 2023-01-11T22:54:21.4423371Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4423517Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4423724Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4423877Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4424095Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4424204Z self.run() 2023-01-11T22:54:21.4424390Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4424539Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4424886Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4425022Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4425389Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4425514Z getattr(self, test_name)() 2023-01-11T22:54:21.4425875Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4425977Z fn() 2023-01-11T22:54:21.4426324Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4426516Z test(self, **param_kwargs) 2023-01-11T22:54:21.4426883Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4427013Z return func(*args, **kwargs) 2023-01-11T22:54:21.4427294Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4427412Z self.run_subtests( 2023-01-11T22:54:21.4427817Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4427988Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4428339Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4428495Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4428870Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4428998Z output = model(*input) 2023-01-11T22:54:21.4429326Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4429466Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4429847Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4430028Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4430379Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4430502Z _lazy_init(state, module) 2023-01-11T22:54:21.4430858Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4431029Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4431432Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4431577Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4431914Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4432042Z return func(*args, **kwargs) 2023-01-11T22:54:21.4432403Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4432510Z p_assert( 2023-01-11T22:54:21.4432850Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4432982Z traceback.print_stack() 2023-01-11T22:54:21.4433382Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.4434133Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4434271Z File "", line 1, in 2023-01-11T22:54:21.4434486Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4434637Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4434826Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4434980Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4435199Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4435305Z self.run() 2023-01-11T22:54:21.4435511Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4435763Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4436114Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4436249Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4436594Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4436718Z getattr(self, test_name)() 2023-01-11T22:54:21.4437128Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4437236Z fn() 2023-01-11T22:54:21.4437612Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4437742Z test(self, **param_kwargs) 2023-01-11T22:54:21.4438098Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4438231Z return func(*args, **kwargs) 2023-01-11T22:54:21.4438492Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4438611Z self.run_subtests( 2023-01-11T22:54:21.4438968Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4439134Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4439505Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4439660Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4440041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4440164Z output = model(*input) 2023-01-11T22:54:21.4440474Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4440619Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4440997Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4441177Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4441548Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4441675Z _lazy_init(state, module) 2023-01-11T22:54:21.4442031Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4442206Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4442590Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4442735Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4443083Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4443210Z return func(*args, **kwargs) 2023-01-11T22:54:21.4443590Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4443698Z p_assert( 2023-01-11T22:54:21.4444044Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4444175Z traceback.print_stack() 2023-01-11T22:54:21.4444405Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:54:21.4444642Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:54:21.4445044Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.4445515Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.4446265Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4447064Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4447207Z File "", line 1, in 2023-01-11T22:54:21.4447424Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4447572Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4447786Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4447920Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4448175Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4448282Z self.run() 2023-01-11T22:54:21.4448489Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4448643Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4448997Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4449132Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4449495Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4449602Z getattr(self, test_name)() 2023-01-11T22:54:21.4449964Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4450071Z fn() 2023-01-11T22:54:21.4450440Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4450565Z test(self, **param_kwargs) 2023-01-11T22:54:21.4450927Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4451060Z return func(*args, **kwargs) 2023-01-11T22:54:21.4451343Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4451440Z self.run_subtests( 2023-01-11T22:54:21.4451800Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4451965Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4452341Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4452496Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4453021Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4453151Z output = model(*input) 2023-01-11T22:54:21.4453494Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4453619Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4454000Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4454178Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4454548Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4454764Z _lazy_init(state, module) 2023-01-11T22:54:21.4455129Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4455300Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4455699Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4455886Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4456245Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4456374Z return func(*args, **kwargs) 2023-01-11T22:54:21.4456751Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4456856Z p_assert( 2023-01-11T22:54:21.4457195Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4457328Z traceback.print_stack() 2023-01-11T22:54:21.4457461Z File "", line 1, in 2023-01-11T22:54:21.4457653Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4457797Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4458002Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4458159Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4458375Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4458482Z self.run() 2023-01-11T22:54:21.4458687Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4458816Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4459160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4459295Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4459663Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4459788Z getattr(self, test_name)() 2023-01-11T22:54:21.4460151Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4460252Z fn() 2023-01-11T22:54:21.4460620Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4460728Z test(self, **param_kwargs) 2023-01-11T22:54:21.4461086Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4461214Z return func(*args, **kwargs) 2023-01-11T22:54:21.4461494Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4461615Z self.run_subtests( 2023-01-11T22:54:21.4461973Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4462137Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4462506Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4462642Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4463019Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4463142Z output = model(*input) 2023-01-11T22:54:21.4463472Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4463614Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4463994Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4464254Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4464626Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4464733Z _lazy_init(state, module) 2023-01-11T22:54:21.4465089Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4465306Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4465721Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4465867Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4466208Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4466337Z return func(*args, **kwargs) 2023-01-11T22:54:21.4466723Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4466810Z p_assert( 2023-01-11T22:54:21.4467150Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4467279Z traceback.print_stack() 2023-01-11T22:54:21.4467526Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:54:21.4467768Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:54:21.4468172Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.4468570Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.4469325Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4470078Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4470212Z File "", line 1, in 2023-01-11T22:54:21.4470414Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4470560Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4470765Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4470921Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4471140Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4471246Z self.run() 2023-01-11T22:54:21.4471452Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4471601Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4471960Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4472100Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4472470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4472594Z getattr(self, test_name)() 2023-01-11T22:54:21.4472955Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4473055Z fn() 2023-01-11T22:54:21.4473425Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4473615Z test(self, **param_kwargs) 2023-01-11T22:54:21.4473961Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4474089Z return func(*args, **kwargs) 2023-01-11T22:54:21.4474373Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4474534Z self.run_subtests( 2023-01-11T22:54:21.4474904Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4475071Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4475443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4475600Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4475966Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4476087Z output = model(*input) 2023-01-11T22:54:21.4476417Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4476559Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4476941Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4477120Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4477488Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4477612Z _lazy_init(state, module) 2023-01-11T22:54:21.4477951Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4478125Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4478526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4478674Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4479013Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4479143Z return func(*args, **kwargs) 2023-01-11T22:54:21.4479526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4479631Z p_assert( 2023-01-11T22:54:21.4479954Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4480088Z traceback.print_stack() 2023-01-11T22:54:21.4480219Z File "", line 1, in 2023-01-11T22:54:21.4480432Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4480582Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4480785Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4480937Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4481153Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4481242Z self.run() 2023-01-11T22:54:21.4481449Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4481600Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4481943Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4482077Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4482439Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4482563Z getattr(self, test_name)() 2023-01-11T22:54:21.4482976Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4483078Z fn() 2023-01-11T22:54:21.4483448Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4483574Z test(self, **param_kwargs) 2023-01-11T22:54:21.4483982Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4484118Z return func(*args, **kwargs) 2023-01-11T22:54:21.4484398Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4484516Z self.run_subtests( 2023-01-11T22:54:21.4484862Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4485030Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4485399Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4485554Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4485928Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4486049Z output = model(*input) 2023-01-11T22:54:21.4486379Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4486520Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4486883Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4487060Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4487432Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4487560Z _lazy_init(state, module) 2023-01-11T22:54:21.4487922Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4488092Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4488493Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4488638Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4488975Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4489084Z return func(*args, **kwargs) 2023-01-11T22:54:21.4489461Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4489567Z p_assert( 2023-01-11T22:54:21.4489910Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4490042Z traceback.print_stack() 2023-01-11T22:54:21.4490290Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:54:21.4490531Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:54:21.4490938Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.4491322Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.4492072Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4493050Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4493301Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:54:21.4493614Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:54:21.4494033Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.4494429Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.4494676Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:54:21.4494918Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:54:21.4495314Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.4495711Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.4495935Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:54:21.4496168Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:54:21.4496557Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.4496946Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.4497188Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:54:21.4497419Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:54:21.4497812Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.4498204Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.4498444Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:54:21.4498658Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:54:21.4499049Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.4499443Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.4499681Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:54:21.4499914Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:54:21.4500307Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.4500701Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.4501449Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4501769Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:54:21.4502007Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:54:21.4502410Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.4502833Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.4503590Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4504327Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4505079Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4505815Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4506549Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4507287Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4508024Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4508759Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4509498Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4510227Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4510959Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4511792Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4512536Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4513263Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4514002Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4514723Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4515449Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4516186Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4516915Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4517642Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4518379Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4519108Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4519837Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4520684Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4520939Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:54:21.4521182Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:54:21.4521589Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.4521989Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.4522730Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4523752Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:451: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.4523893Z shapes.append(param.shape) 2023-01-11T22:54:21.4524135Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:54:21.4524521Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.4524765Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:54:21.4525165Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.4525905Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4526152Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:54:21.4526389Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:54:21.4526782Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.4527176Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.4527921Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4528166Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:54:21.4528402Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:54:21.4528861Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.4529257Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.4530042Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4530294Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 0 2023-01-11T22:54:21.4530531Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 1 2023-01-11T22:54:21.4530930Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:54:21.4531330Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:54:21.4532076Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4532320Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 0 2023-01-11T22:54:21.4532559Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 1 2023-01-11T22:54:21.4533129Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:54:21.4533516Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:54:21.4534266Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4534512Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 0 2023-01-11T22:54:21.4534746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 1 2023-01-11T22:54:21.4535143Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:54:21.4535535Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:54:21.4536272Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4536520Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 0 2023-01-11T22:54:21.4536759Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 1 2023-01-11T22:54:21.4537153Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:54:21.4537546Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:54:21.4538289Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4538604Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 0 2023-01-11T22:54:21.4538841Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 1 2023-01-11T22:54:21.4539301Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:54:21.4539713Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:54:21.4540456Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4540704Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 0 2023-01-11T22:54:21.4540941Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 1 2023-01-11T22:54:21.4541337Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:54:21.4541732Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:54:21.4542471Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4542714Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 0 2023-01-11T22:54:21.4542938Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 1 2023-01-11T22:54:21.4543335Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:54:21.4543727Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:54:21.4544474Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4544721Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 0 2023-01-11T22:54:21.4544957Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 1 2023-01-11T22:54:21.4545356Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:54:21.4545746Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:54:21.4546489Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4547221Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4548026Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4548808Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4549552Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4550284Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4551019Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4551746Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4552478Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4553209Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4553938Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4554672Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4555403Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4556132Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4556924Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4557701Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4558442Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4559171Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4559906Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4560633Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4561363Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4562094Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4562823Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4563552Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4564283Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4565009Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4565796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4566564Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4566797Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 0 2023-01-11T22:54:21.4567038Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 1 2023-01-11T22:54:21.4567449Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:54:21.4567846Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:54:21.4568584Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4569314Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4569557Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 1 2023-01-11T22:54:21.4569798Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 0 2023-01-11T22:54:21.4570194Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:54:21.4570584Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:54:21.4571322Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4572050Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4572323Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 1 2023-01-11T22:54:21.4572540Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 0 2023-01-11T22:54:21.4573116Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:54:21.4573522Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:54:21.4574260Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4575080Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4575323Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 1 2023-01-11T22:54:21.4575650Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 0 2023-01-11T22:54:21.4576061Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:54:21.4576454Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:54:21.4577186Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4577924Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4578041Z dist init r=1, world=2 2023-01-11T22:54:21.4578372Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4578671Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4578984Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4579291Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4579599Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4579904Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4580205Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4580505Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4580811Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4581112Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4581415Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4581745Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4581859Z dist init r=0, world=2 2023-01-11T22:54:21.4582158Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4582530Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4582837Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4583186Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4583497Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4583800Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4584106Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4584409Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4584713Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4585017Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4585319Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4585628Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4585714Z ok (30.264s) 2023-01-11T22:54:21.4586079Z test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87604 2023-01-11T22:54:21.4586302Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87605 2023-01-11T22:54:21.4586691Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.4586871Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.4587256Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.4587450Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.4587824Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.4587984Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.4588365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.4588555Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.4588803Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.4589048Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.4589448Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.4589847Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.4590149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.4590379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.4591455Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.4591579Z warnings.warn( 2023-01-11T22:54:21.4591806Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2023-01-11T22:54:21.4592815Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.4592935Z warnings.warn( 2023-01-11T22:54:21.4593178Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2023-01-11T22:54:21.4593576Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.4594115Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:54:21.4594230Z warnings.warn( 2023-01-11T22:54:21.4594994Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4595392Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2023-01-11T22:54:21.4595934Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:288: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2023-01-11T22:54:21.4596051Z warnings.warn( 2023-01-11T22:54:21.4596166Z File "", line 1, in 2023-01-11T22:54:21.4596382Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4596529Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4596742Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4596896Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4597110Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4597217Z self.run() 2023-01-11T22:54:21.4597403Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4597552Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4597906Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4598044Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4598410Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4598536Z getattr(self, test_name)() 2023-01-11T22:54:21.4598900Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4599060Z fn() 2023-01-11T22:54:21.4599420Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4599546Z test(self, **param_kwargs) 2023-01-11T22:54:21.4599906Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4600033Z return func(*args, **kwargs) 2023-01-11T22:54:21.4600362Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4600485Z self.run_subtests( 2023-01-11T22:54:21.4600849Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4601015Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4601362Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4601522Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4601902Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4602024Z output = model(*input) 2023-01-11T22:54:21.4602352Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4602495Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4602874Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4603052Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4603404Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4603528Z _lazy_init(state, module) 2023-01-11T22:54:21.4603888Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4604062Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4604461Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4604605Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4604948Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4605076Z return func(*args, **kwargs) 2023-01-11T22:54:21.4605435Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4605543Z p_assert( 2023-01-11T22:54:21.4605881Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4606009Z traceback.print_stack() 2023-01-11T22:54:21.4606762Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4606899Z File "", line 1, in 2023-01-11T22:54:21.4607115Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4607262Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4607467Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4607601Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4607813Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4607918Z self.run() 2023-01-11T22:54:21.4608126Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4608338Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4608688Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4608824Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4609169Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4609344Z getattr(self, test_name)() 2023-01-11T22:54:21.4609721Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4609826Z fn() 2023-01-11T22:54:21.4610195Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4610321Z test(self, **param_kwargs) 2023-01-11T22:54:21.4610678Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4610809Z return func(*args, **kwargs) 2023-01-11T22:54:21.4611071Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4611189Z self.run_subtests( 2023-01-11T22:54:21.4611548Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4611716Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4612086Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4612239Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4612613Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4612734Z output = model(*input) 2023-01-11T22:54:21.4613233Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4613384Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4613770Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4613951Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4614323Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4614447Z _lazy_init(state, module) 2023-01-11T22:54:21.4614800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4614970Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4615367Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4615497Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4615837Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4615966Z return func(*args, **kwargs) 2023-01-11T22:54:21.4616347Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4616450Z p_assert( 2023-01-11T22:54:21.4616791Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4616921Z traceback.print_stack() 2023-01-11T22:54:21.4617169Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2023-01-11T22:54:21.4617397Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2023-01-11T22:54:21.4617795Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.4618292Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2023-01-11T22:54:21.4619100Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4619856Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4619991Z File "", line 1, in 2023-01-11T22:54:21.4620207Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4620356Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4620560Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4620711Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4620910Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4621016Z self.run() 2023-01-11T22:54:21.4621231Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4621382Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4621733Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4621870Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4622237Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4622346Z getattr(self, test_name)() 2023-01-11T22:54:21.4622716Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4622815Z fn() 2023-01-11T22:54:21.4623183Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4623311Z test(self, **param_kwargs) 2023-01-11T22:54:21.4623674Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4623801Z return func(*args, **kwargs) 2023-01-11T22:54:21.4624080Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4624178Z self.run_subtests( 2023-01-11T22:54:21.4624535Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4624700Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4625071Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4625225Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4625601Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4625722Z output = model(*input) 2023-01-11T22:54:21.4626053Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4626175Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4626556Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4626733Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4627107Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4627296Z _lazy_init(state, module) 2023-01-11T22:54:21.4627663Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4627833Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4628232Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4628424Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4628761Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4628891Z return func(*args, **kwargs) 2023-01-11T22:54:21.4629273Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4629378Z p_assert( 2023-01-11T22:54:21.4629720Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4629850Z traceback.print_stack() 2023-01-11T22:54:21.4629980Z File "", line 1, in 2023-01-11T22:54:21.4630172Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4630319Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4630521Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4630680Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4630896Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4631002Z self.run() 2023-01-11T22:54:21.4631204Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4631353Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4631678Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4631819Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4632184Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4632311Z getattr(self, test_name)() 2023-01-11T22:54:21.4632668Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4632767Z fn() 2023-01-11T22:54:21.4633136Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4633262Z test(self, **param_kwargs) 2023-01-11T22:54:21.4633604Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4633729Z return func(*args, **kwargs) 2023-01-11T22:54:21.4634009Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4634128Z self.run_subtests( 2023-01-11T22:54:21.4634487Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4634651Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4635014Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4635170Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4635530Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4635651Z output = model(*input) 2023-01-11T22:54:21.4635981Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4636121Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4636500Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4636736Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4637113Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4637235Z _lazy_init(state, module) 2023-01-11T22:54:21.4637621Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4637797Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4638205Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4638351Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4638693Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4638823Z return func(*args, **kwargs) 2023-01-11T22:54:21.4639200Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4639304Z p_assert( 2023-01-11T22:54:21.4639625Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4639753Z traceback.print_stack() 2023-01-11T22:54:21.4640005Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2023-01-11T22:54:21.4640254Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2023-01-11T22:54:21.4640656Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.4641051Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2023-01-11T22:54:21.4641803Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4642560Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4642695Z File "", line 1, in 2023-01-11T22:54:21.4642910Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4643038Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4643246Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4643402Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4643618Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4643724Z self.run() 2023-01-11T22:54:21.4643929Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4644078Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4644427Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4644546Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4644909Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4645036Z getattr(self, test_name)() 2023-01-11T22:54:21.4645399Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4645503Z fn() 2023-01-11T22:54:21.4645869Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4646058Z test(self, **param_kwargs) 2023-01-11T22:54:21.4646426Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4646535Z return func(*args, **kwargs) 2023-01-11T22:54:21.4646860Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4646982Z self.run_subtests( 2023-01-11T22:54:21.4647346Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4647511Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4647876Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4648031Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4648413Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4648516Z output = model(*input) 2023-01-11T22:54:21.4648845Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4648985Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4649366Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4649542Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4649911Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4650032Z _lazy_init(state, module) 2023-01-11T22:54:21.4650387Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4650542Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4650941Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4651086Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4651427Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4651558Z return func(*args, **kwargs) 2023-01-11T22:54:21.4651936Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4652040Z p_assert( 2023-01-11T22:54:21.4652377Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4652487Z traceback.print_stack() 2023-01-11T22:54:21.4652618Z File "", line 1, in 2023-01-11T22:54:21.4652837Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4653169Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4653382Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4653538Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4653756Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4653845Z self.run() 2023-01-11T22:54:21.4654057Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4654205Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4654552Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4654687Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4655048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4655269Z getattr(self, test_name)() 2023-01-11T22:54:21.4655638Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4655720Z fn() 2023-01-11T22:54:21.4656088Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4656212Z test(self, **param_kwargs) 2023-01-11T22:54:21.4656628Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4656764Z return func(*args, **kwargs) 2023-01-11T22:54:21.4657046Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4657161Z self.run_subtests( 2023-01-11T22:54:21.4657525Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4657676Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4658040Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4658196Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4658573Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4658698Z output = model(*input) 2023-01-11T22:54:21.4659028Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4659166Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4659542Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4659701Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4660071Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4660198Z _lazy_init(state, module) 2023-01-11T22:54:21.4660554Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4660724Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4661125Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4661271Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4661614Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4661724Z return func(*args, **kwargs) 2023-01-11T22:54:21.4662105Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4662212Z p_assert( 2023-01-11T22:54:21.4662555Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4662684Z traceback.print_stack() 2023-01-11T22:54:21.4662933Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2023-01-11T22:54:21.4663179Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2023-01-11T22:54:21.4663585Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.4663985Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2023-01-11T22:54:21.4664718Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4665535Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4665670Z File "", line 1, in 2023-01-11T22:54:21.4665928Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4666082Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4666289Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4666442Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4666659Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4666765Z self.run() 2023-01-11T22:54:21.4666955Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4667102Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4667454Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4667590Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4667955Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4668079Z getattr(self, test_name)() 2023-01-11T22:54:21.4668442Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4668543Z fn() 2023-01-11T22:54:21.4668888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4669014Z test(self, **param_kwargs) 2023-01-11T22:54:21.4669372Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4669503Z return func(*args, **kwargs) 2023-01-11T22:54:21.4669784Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4669901Z self.run_subtests( 2023-01-11T22:54:21.4670261Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4670427Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4670778Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4670934Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4671311Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4671435Z output = model(*input) 2023-01-11T22:54:21.4671764Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4671904Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4672279Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4672455Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4672838Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4672967Z _lazy_init(state, module) 2023-01-11T22:54:21.4673323Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4673492Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4673890Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4674095Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4674440Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4674570Z return func(*args, **kwargs) 2023-01-11T22:54:21.4674929Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4675080Z p_assert( 2023-01-11T22:54:21.4675432Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4675562Z traceback.print_stack() 2023-01-11T22:54:21.4675695Z File "", line 1, in 2023-01-11T22:54:21.4675907Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4676052Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4676255Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4676395Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4676609Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4676714Z self.run() 2023-01-11T22:54:21.4676918Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4677067Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4677411Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4677545Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4677889Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4678016Z getattr(self, test_name)() 2023-01-11T22:54:21.4678376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4678482Z fn() 2023-01-11T22:54:21.4678848Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4678975Z test(self, **param_kwargs) 2023-01-11T22:54:21.4679333Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4679458Z return func(*args, **kwargs) 2023-01-11T22:54:21.4679722Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4679840Z self.run_subtests( 2023-01-11T22:54:21.4680196Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4680362Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4680724Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4680883Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4681258Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4681378Z output = model(*input) 2023-01-11T22:54:21.4681691Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4681836Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4682216Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4682393Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4682760Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4682882Z _lazy_init(state, module) 2023-01-11T22:54:21.4683234Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4683465Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4683871Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4683999Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4684379Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4684514Z return func(*args, **kwargs) 2023-01-11T22:54:21.4684898Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4685002Z p_assert( 2023-01-11T22:54:21.4685337Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4685464Z traceback.print_stack() 2023-01-11T22:54:21.4685714Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2023-01-11T22:54:21.4685942Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2023-01-11T22:54:21.4686342Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.4686743Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2023-01-11T22:54:21.4687494Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4688241Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4688378Z File "", line 1, in 2023-01-11T22:54:21.4688594Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4688740Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4688950Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4689106Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4689304Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4689409Z self.run() 2023-01-11T22:54:21.4689613Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4689762Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4690106Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4690247Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4690612Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4690719Z getattr(self, test_name)() 2023-01-11T22:54:21.4691080Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4691180Z fn() 2023-01-11T22:54:21.4691542Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4691663Z test(self, **param_kwargs) 2023-01-11T22:54:21.4692017Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4692142Z return func(*args, **kwargs) 2023-01-11T22:54:21.4692419Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4692578Z self.run_subtests( 2023-01-11T22:54:21.4693084Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4693255Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4693692Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4693859Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4693989Z File "", line 1, in 2023-01-11T22:54:21.4694376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4694495Z output = model(*input) 2023-01-11T22:54:21.4694801Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4694948Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4695160Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4695305Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4695686Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4695864Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4696072Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4696225Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4696578Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4696701Z _lazy_init(state, module) 2023-01-11T22:54:21.4696913Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4697018Z self.run() 2023-01-11T22:54:21.4697373Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4697541Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4697743Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4697891Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4698276Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4698417Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4698759Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4698894Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4699232Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4699359Z return func(*args, **kwargs) 2023-01-11T22:54:21.4699719Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4699845Z getattr(self, test_name)() 2023-01-11T22:54:21.4700204Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4700307Z p_assert( 2023-01-11T22:54:21.4700673Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4700771Z fn() 2023-01-11T22:54:21.4701107Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4701234Z traceback.print_stack() 2023-01-11T22:54:21.4701605Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4701807Z test(self, **param_kwargs) 2023-01-11T22:54:21.4702153Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4702279Z return func(*args, **kwargs) 2023-01-11T22:54:21.4702562Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4702677Z self.run_subtests( 2023-01-11T22:54:21.4703082Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4703251Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4703623Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4703777Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4704135Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4704259Z output = model(*input) 2023-01-11T22:54:21.4704591Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4704732Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4705105Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4705283Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4705653Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4705772Z _lazy_init(state, module) 2023-01-11T22:54:21.4706107Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4706272Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4706670Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4706808Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4707140Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4707263Z return func(*args, **kwargs) 2023-01-11T22:54:21.4707639Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4707742Z p_assert( 2023-01-11T22:54:21.4708061Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4708183Z traceback.print_stack() 2023-01-11T22:54:21.4708428Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2023-01-11T22:54:21.4708671Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2023-01-11T22:54:21.4709077Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.4709476Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2023-01-11T22:54:21.4710226Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4710968Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4711163Z File "", line 1, in 2023-01-11T22:54:21.4711377Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4711504Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4711707Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4711859Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4712111Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4712224Z self.run() 2023-01-11T22:54:21.4712428Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4712576Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4712928Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4713044Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4713405Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4713534Z getattr(self, test_name)() 2023-01-11T22:54:21.4713896Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4713993Z fn() 2023-01-11T22:54:21.4714351Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4714479Z test(self, **param_kwargs) 2023-01-11T22:54:21.4714829Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4714938Z return func(*args, **kwargs) 2023-01-11T22:54:21.4715212Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4715324Z self.run_subtests( 2023-01-11T22:54:21.4715681Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4715845Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4716206Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4716358Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4716730Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4716835Z output = model(*input) 2023-01-11T22:54:21.4717164Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4717305Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4717677Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4717853Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4718223Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4718345Z _lazy_init(state, module) 2023-01-11T22:54:21.4718699Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4718851Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4719252Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4719395Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4719729Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4719852Z return func(*args, **kwargs) 2023-01-11T22:54:21.4720229Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4720436Z p_assert( 2023-01-11T22:54:21.4720781Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4720891Z traceback.print_stack() 2023-01-11T22:54:21.4721021Z File "", line 1, in 2023-01-11T22:54:21.4721233Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4721425Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4721641Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4721794Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4722007Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4722095Z self.run() 2023-01-11T22:54:21.4722295Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4722439Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4722788Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4722923Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4723288Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4723413Z getattr(self, test_name)() 2023-01-11T22:54:21.4723777Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4723860Z fn() 2023-01-11T22:54:21.4724226Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4724349Z test(self, **param_kwargs) 2023-01-11T22:54:21.4724706Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4724832Z return func(*args, **kwargs) 2023-01-11T22:54:21.4725117Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4725231Z self.run_subtests( 2023-01-11T22:54:21.4725585Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4725732Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4726099Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4726253Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4726632Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4726754Z output = model(*input) 2023-01-11T22:54:21.4727083Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4727223Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4727603Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4727762Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4728125Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4728248Z _lazy_init(state, module) 2023-01-11T22:54:21.4728600Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4728766Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4729163Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4729306Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4729712Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4729822Z return func(*args, **kwargs) 2023-01-11T22:54:21.4730198Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4730302Z p_assert( 2023-01-11T22:54:21.4730687Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4730821Z traceback.print_stack() 2023-01-11T22:54:21.4731065Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2023-01-11T22:54:21.4731313Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2023-01-11T22:54:21.4731719Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.4732115Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2023-01-11T22:54:21.4732849Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4733751Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4733886Z File "", line 1, in 2023-01-11T22:54:21.4734096Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4734242Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4734450Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4734601Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4734816Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4734921Z self.run() 2023-01-11T22:54:21.4735108Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4735258Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4735598Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4735730Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4736092Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4736217Z getattr(self, test_name)() 2023-01-11T22:54:21.4736579Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4736681Z fn() 2023-01-11T22:54:21.4737028Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4737151Z test(self, **param_kwargs) 2023-01-11T22:54:21.4737505Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4737636Z return func(*args, **kwargs) 2023-01-11T22:54:21.4737916Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4738032Z self.run_subtests( 2023-01-11T22:54:21.4738384Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4738550Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4738896Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4739140Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4739525Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4739647Z output = model(*input) 2023-01-11T22:54:21.4740035Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4740180Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4740563Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4740739Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4741091Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4741212Z _lazy_init(state, module) 2023-01-11T22:54:21.4741573Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4741738Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4742132Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4742274Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4742615Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4742737Z return func(*args, **kwargs) 2023-01-11T22:54:21.4743098Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4743202Z p_assert( 2023-01-11T22:54:21.4743544Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4743672Z traceback.print_stack() 2023-01-11T22:54:21.4743802Z File "", line 1, in 2023-01-11T22:54:21.4744010Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4744154Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4744356Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4744490Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4744708Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4744812Z self.run() 2023-01-11T22:54:21.4745017Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4745164Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4745503Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4745637Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4745987Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4746108Z getattr(self, test_name)() 2023-01-11T22:54:21.4746470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4746569Z fn() 2023-01-11T22:54:21.4746941Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4747067Z test(self, **param_kwargs) 2023-01-11T22:54:21.4747425Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4747551Z return func(*args, **kwargs) 2023-01-11T22:54:21.4747812Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4747926Z self.run_subtests( 2023-01-11T22:54:21.4748359Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4748551Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4748914Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4749068Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4749494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4749618Z output = model(*input) 2023-01-11T22:54:21.4749935Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4750077Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4750452Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4750633Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4751002Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4751125Z _lazy_init(state, module) 2023-01-11T22:54:21.4751473Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4751644Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4752040Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4752167Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4752502Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4752628Z return func(*args, **kwargs) 2023-01-11T22:54:21.4753006Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4753114Z p_assert( 2023-01-11T22:54:21.4753451Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4753577Z traceback.print_stack() 2023-01-11T22:54:21.4753821Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2023-01-11T22:54:21.4754050Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2023-01-11T22:54:21.4754446Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.4754842Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2023-01-11T22:54:21.4755596Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4756350Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4756482Z File "", line 1, in 2023-01-11T22:54:21.4756691Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4756834Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4757041Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4757192Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4757457Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4757557Z self.run() 2023-01-11T22:54:21.4757764Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4757912Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4758261Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4758443Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4758819Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4758926Z getattr(self, test_name)() 2023-01-11T22:54:21.4759293Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4759394Z fn() 2023-01-11T22:54:21.4759759Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4759888Z test(self, **param_kwargs) 2023-01-11T22:54:21.4760246Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4760373Z return func(*args, **kwargs) 2023-01-11T22:54:21.4760647Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4760747Z self.run_subtests( 2023-01-11T22:54:21.4761104Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4761266Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4761632Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4761785Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4762161Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4762287Z output = model(*input) 2023-01-11T22:54:21.4762620Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4762742Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4763122Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4763297Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4763667Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4763790Z _lazy_init(state, module) 2023-01-11T22:54:21.4764147Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4764313Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4764715Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4764854Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4765175Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4765303Z return func(*args, **kwargs) 2023-01-11T22:54:21.4765682Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4765784Z p_assert( 2023-01-11T22:54:21.4766125Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4766250Z traceback.print_stack() 2023-01-11T22:54:21.4766380Z File "", line 1, in 2023-01-11T22:54:21.4766571Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4766778Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4766978Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4767129Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4767343Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4767448Z self.run() 2023-01-11T22:54:21.4767688Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4767844Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4768175Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4768308Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4768671Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4768795Z getattr(self, test_name)() 2023-01-11T22:54:21.4769159Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4769252Z fn() 2023-01-11T22:54:21.4769618Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4769741Z test(self, **param_kwargs) 2023-01-11T22:54:21.4770083Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4770208Z return func(*args, **kwargs) 2023-01-11T22:54:21.4770487Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4770604Z self.run_subtests( 2023-01-11T22:54:21.4770961Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4771123Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4771490Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4771644Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4771998Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4772119Z output = model(*input) 2023-01-11T22:54:21.4772446Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4772588Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4773109Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4773288Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4773665Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4773791Z _lazy_init(state, module) 2023-01-11T22:54:21.4774127Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4774292Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4774690Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4774838Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4775176Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4775301Z return func(*args, **kwargs) 2023-01-11T22:54:21.4775671Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4775773Z p_assert( 2023-01-11T22:54:21.4776092Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4776300Z traceback.print_stack() 2023-01-11T22:54:21.4776545Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2023-01-11T22:54:21.4776783Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2023-01-11T22:54:21.4777246Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.4778015Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4778758Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4779504Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4780242Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4780977Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4781114Z File "", line 1, in 2023-01-11T22:54:21.4781329Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4781468Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4781659Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4781811Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4782026Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4782130Z self.run() 2023-01-11T22:54:21.4782333Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4782482Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4782822Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4782945Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4783307Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4783431Z getattr(self, test_name)() 2023-01-11T22:54:21.4783791Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4783893Z fn() 2023-01-11T22:54:21.4784256Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4784382Z test(self, **param_kwargs) 2023-01-11T22:54:21.4784741Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4784850Z return func(*args, **kwargs) 2023-01-11T22:54:21.4785127Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4785319Z self.run_subtests( 2023-01-11T22:54:21.4785684Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4785846Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4786260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4786421Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4786796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4786900Z output = model(*input) 2023-01-11T22:54:21.4787229Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4787369Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4787755Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4787932Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4788299Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4788422Z _lazy_init(state, module) 2023-01-11T22:54:21.4788783Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4788935Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4789335Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4789481Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4789824Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4789953Z return func(*args, **kwargs) 2023-01-11T22:54:21.4790332Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4790435Z p_assert( 2023-01-11T22:54:21.4790774Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4790884Z traceback.print_stack() 2023-01-11T22:54:21.4791284Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2023-01-11T22:54:21.4792028Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4792777Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4793526Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4794266Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4794997Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4795188Z File "", line 1, in 2023-01-11T22:54:21.4795406Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4795594Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4795803Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4795955Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4796170Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4796259Z self.run() 2023-01-11T22:54:21.4796463Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4796614Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4796967Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4797103Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4797470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4797599Z getattr(self, test_name)() 2023-01-11T22:54:21.4797948Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4798048Z fn() 2023-01-11T22:54:21.4798415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4798539Z test(self, **param_kwargs) 2023-01-11T22:54:21.4798895Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4799024Z return func(*args, **kwargs) 2023-01-11T22:54:21.4799303Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4799413Z self.run_subtests( 2023-01-11T22:54:21.4799751Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4799915Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4800284Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4800440Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4800815Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4800935Z output = model(*input) 2023-01-11T22:54:21.4801255Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4801397Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4801756Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4801931Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4802300Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4802424Z _lazy_init(state, module) 2023-01-11T22:54:21.4802779Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4802947Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4803342Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4803486Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4803892Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4804002Z return func(*args, **kwargs) 2023-01-11T22:54:21.4804381Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4804484Z p_assert( 2023-01-11T22:54:21.4804865Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4804996Z traceback.print_stack() 2023-01-11T22:54:21.4805241Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2023-01-11T22:54:21.4805481Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2023-01-11T22:54:21.4805886Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.4806277Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2023-01-11T22:54:21.4807019Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4807763Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4807893Z File "", line 1, in 2023-01-11T22:54:21.4808104Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4808248Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4808452Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4808603Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4808815Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4808919Z self.run() 2023-01-11T22:54:21.4809105Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4809254Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4809600Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4809734Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4810096Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4810222Z getattr(self, test_name)() 2023-01-11T22:54:21.4810586Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4810672Z fn() 2023-01-11T22:54:21.4811038Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4811164Z test(self, **param_kwargs) 2023-01-11T22:54:21.4811520Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4811647Z return func(*args, **kwargs) 2023-01-11T22:54:21.4811923Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4812037Z self.run_subtests( 2023-01-11T22:54:21.4812393Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4812540Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4813122Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4813277Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4813653Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4813774Z output = model(*input) 2023-01-11T22:54:21.4814169Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4814319Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4814703Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4814862Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4815242Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4815370Z _lazy_init(state, module) 2023-01-11T22:54:21.4815720Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4815887Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4816285Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4816427Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4816764Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4816888Z return func(*args, **kwargs) 2023-01-11T22:54:21.4817247Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4817349Z p_assert( 2023-01-11T22:54:21.4817687Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4817816Z traceback.print_stack() 2023-01-11T22:54:21.4817941Z File "", line 1, in 2023-01-11T22:54:21.4818152Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4818297Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4818482Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4818636Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4818850Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4818950Z self.run() 2023-01-11T22:54:21.4819150Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4819297Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4819640Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4819779Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4820125Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4820249Z getattr(self, test_name)() 2023-01-11T22:54:21.4820609Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4820707Z fn() 2023-01-11T22:54:21.4821077Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4821197Z test(self, **param_kwargs) 2023-01-11T22:54:21.4821552Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4821679Z return func(*args, **kwargs) 2023-01-11T22:54:21.4821939Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4822130Z self.run_subtests( 2023-01-11T22:54:21.4822494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4822659Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4823027Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4823180Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4823600Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4823727Z output = model(*input) 2023-01-11T22:54:21.4824043Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4824183Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4824559Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4824739Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4825109Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4825233Z _lazy_init(state, module) 2023-01-11T22:54:21.4825586Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4825753Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4826135Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4826280Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4826615Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4826739Z return func(*args, **kwargs) 2023-01-11T22:54:21.4827120Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4827222Z p_assert( 2023-01-11T22:54:21.4827551Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4827674Z traceback.print_stack() 2023-01-11T22:54:21.4827904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2023-01-11T22:54:21.4828150Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2023-01-11T22:54:21.4828553Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.4828955Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2023-01-11T22:54:21.4829704Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4830450Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4830584Z File "", line 1, in 2023-01-11T22:54:21.4830798Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4830942Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4831152Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4831288Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4831560Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4831666Z self.run() 2023-01-11T22:54:21.4831869Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4832019Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4832366Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4832549Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4832920Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4833029Z getattr(self, test_name)() 2023-01-11T22:54:21.4833389Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4833483Z fn() 2023-01-11T22:54:21.4833854Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4833979Z test(self, **param_kwargs) 2023-01-11T22:54:21.4834333Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4834458Z return func(*args, **kwargs) 2023-01-11T22:54:21.4834737Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4834839Z self.run_subtests( 2023-01-11T22:54:21.4835193Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4835358Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4835723Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4835876Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4836255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4836378Z output = model(*input) 2023-01-11T22:54:21.4836704Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4836825Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4837202Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4837377Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4837507Z File "", line 1, in 2023-01-11T22:54:21.4837869Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4837990Z _lazy_init(state, module) 2023-01-11T22:54:21.4838343Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4838513Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4838707Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4838852Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4839254Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4839400Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4839604Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4839755Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4840094Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4840219Z return func(*args, **kwargs) 2023-01-11T22:54:21.4840417Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4840582Z self.run() 2023-01-11T22:54:21.4840969Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4841072Z p_assert( 2023-01-11T22:54:21.4841279Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4841429Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4841814Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4841928Z traceback.print_stack() 2023-01-11T22:54:21.4842273Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4842406Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4842765Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4842886Z getattr(self, test_name)() 2023-01-11T22:54:21.4843247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4843346Z fn() 2023-01-11T22:54:21.4843710Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4843816Z test(self, **param_kwargs) 2023-01-11T22:54:21.4844177Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4844301Z return func(*args, **kwargs) 2023-01-11T22:54:21.4844582Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4844695Z self.run_subtests( 2023-01-11T22:54:21.4845050Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4845209Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4845583Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4845723Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4846100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4846221Z output = model(*input) 2023-01-11T22:54:21.4846551Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4846691Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4847067Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4847241Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4847605Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4847713Z _lazy_init(state, module) 2023-01-11T22:54:21.4848065Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4848236Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4848634Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4848778Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4849120Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4849239Z return func(*args, **kwargs) 2023-01-11T22:54:21.4849620Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4849707Z p_assert( 2023-01-11T22:54:21.4850043Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4850242Z traceback.print_stack() 2023-01-11T22:54:21.4850490Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2023-01-11T22:54:21.4850731Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2023-01-11T22:54:21.4851184Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.4851589Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2023-01-11T22:54:21.4852341Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4853263Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4853402Z File "", line 1, in 2023-01-11T22:54:21.4853604Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4853751Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4853956Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4854109Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4854319Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4854421Z self.run() 2023-01-11T22:54:21.4854627Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4854781Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4855110Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4855241Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4855605Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4855735Z getattr(self, test_name)() 2023-01-11T22:54:21.4856093Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4856194Z fn() 2023-01-11T22:54:21.4856564Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4856688Z test(self, **param_kwargs) 2023-01-11T22:54:21.4857028Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4857158Z return func(*args, **kwargs) 2023-01-11T22:54:21.4857437Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4857552Z self.run_subtests( 2023-01-11T22:54:21.4857909Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4858074Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4858441Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4858590Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4858947Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4859065Z output = model(*input) 2023-01-11T22:54:21.4859486Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4859626Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4860004Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4860180Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4860644Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4860777Z _lazy_init(state, module) 2023-01-11T22:54:21.4861121Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4861283Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4861679Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4861829Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4862165Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4862290Z return func(*args, **kwargs) 2023-01-11T22:54:21.4862665Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4862769Z p_assert( 2023-01-11T22:54:21.4863092Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4863221Z traceback.print_stack() 2023-01-11T22:54:21.4863350Z File "", line 1, in 2023-01-11T22:54:21.4863555Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.4863701Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.4863901Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.4864047Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.4864258Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.4864347Z self.run() 2023-01-11T22:54:21.4864548Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.4864690Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.4865034Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.4865170Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.4865535Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.4865660Z getattr(self, test_name)() 2023-01-11T22:54:21.4866000Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.4866098Z fn() 2023-01-11T22:54:21.4866471Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.4866595Z test(self, **param_kwargs) 2023-01-11T22:54:21.4866950Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.4867071Z return func(*args, **kwargs) 2023-01-11T22:54:21.4867347Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 243, in test_mixture_of_experts_with_delay_before_free 2023-01-11T22:54:21.4867459Z self.run_subtests( 2023-01-11T22:54:21.4867794Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.4867960Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.4868323Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.4868545Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.4868927Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.4869046Z output = model(*input) 2023-01-11T22:54:21.4869368Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.4869508Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.4869910Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.4870092Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.4870465Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.4870586Z _lazy_init(state, module) 2023-01-11T22:54:21.4870943Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.4871115Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.4871510Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.4871653Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.4871992Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.4872105Z return func(*args, **kwargs) 2023-01-11T22:54:21.4872486Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.4872591Z p_assert( 2023-01-11T22:54:21.4872927Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.4873054Z traceback.print_stack() 2023-01-11T22:54:21.4873303Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2023-01-11T22:54:21.4873572Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2023-01-11T22:54:21.4873974Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.4874356Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2023-01-11T22:54:21.4875108Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4875853Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4876099Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2023-01-11T22:54:21.4876335Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2023-01-11T22:54:21.4876741Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.4877135Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2023-01-11T22:54:21.4877377Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2023-01-11T22:54:21.4877613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2023-01-11T22:54:21.4878003Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.4878471Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2023-01-11T22:54:21.4878691Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2023-01-11T22:54:21.4878975Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2023-01-11T22:54:21.4879381Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.4879774Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2023-01-11T22:54:21.4880014Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2023-01-11T22:54:21.4880246Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2023-01-11T22:54:21.4880641Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.4881031Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2023-01-11T22:54:21.4881267Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2023-01-11T22:54:21.4881483Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2023-01-11T22:54:21.4881879Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.4882264Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2023-01-11T22:54:21.4882501Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 1 2023-01-11T22:54:21.4882736Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:20 to store for rank: 0 2023-01-11T22:54:21.4883126Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.4883512Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:20 with 2 nodes. 2023-01-11T22:54:21.4884259Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4884505Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 1 2023-01-11T22:54:21.4884743Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:21 to store for rank: 0 2023-01-11T22:54:21.4885135Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.4885509Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:21 with 2 nodes. 2023-01-11T22:54:21.4886249Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4886987Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4887796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4888583Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4889324Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4890060Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4890796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4891529Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4892259Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4893138Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4893882Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4894623Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4895359Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4896091Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4896904Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4897686Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4898425Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4899156Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4899882Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4900599Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4901333Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4902062Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4902786Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4903514Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4903757Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 0 2023-01-11T22:54:21.4903997Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:22 to store for rank: 1 2023-01-11T22:54:21.4904392Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.4904790Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:22 with 2 nodes. 2023-01-11T22:54:21.4905519Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4906651Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:451: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.4906796Z shapes.append(param.shape) 2023-01-11T22:54:21.4907024Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 0 2023-01-11T22:54:21.4907424Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.4907669Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:23 to store for rank: 1 2023-01-11T22:54:21.4908069Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:23 with 2 nodes. 2023-01-11T22:54:21.4908807Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4909048Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 0 2023-01-11T22:54:21.4909279Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:24 to store for rank: 1 2023-01-11T22:54:21.4909668Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.4910068Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:24 with 2 nodes. 2023-01-11T22:54:21.4910806Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4911051Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 1 2023-01-11T22:54:21.4911269Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:25 to store for rank: 0 2023-01-11T22:54:21.4911664Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.4912059Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:25 with 2 nodes. 2023-01-11T22:54:21.4912807Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4913055Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 1 2023-01-11T22:54:21.4913289Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:26 to store for rank: 0 2023-01-11T22:54:21.4913677Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:54:21.4914069Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:26 with 2 nodes. 2023-01-11T22:54:21.4914805Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4915119Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 1 2023-01-11T22:54:21.4915392Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:27 to store for rank: 0 2023-01-11T22:54:21.4915778Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:54:21.4916170Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:27 with 2 nodes. 2023-01-11T22:54:21.4916908Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4917157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 1 2023-01-11T22:54:21.4917394Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:28 to store for rank: 0 2023-01-11T22:54:21.4917792Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:54:21.4918183Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:28 with 2 nodes. 2023-01-11T22:54:21.4918921Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4919170Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 1 2023-01-11T22:54:21.4919404Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:29 to store for rank: 0 2023-01-11T22:54:21.4919794Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:54:21.4920174Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:29 with 2 nodes. 2023-01-11T22:54:21.4920917Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4921159Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 1 2023-01-11T22:54:21.4921393Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:30 to store for rank: 0 2023-01-11T22:54:21.4921788Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:54:21.4922183Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:30 with 2 nodes. 2023-01-11T22:54:21.4922922Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4923164Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 0 2023-01-11T22:54:21.4923457Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:31 to store for rank: 1 2023-01-11T22:54:21.4923855Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:54:21.4924248Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:31 with 2 nodes. 2023-01-11T22:54:21.4925038Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4925269Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 0 2023-01-11T22:54:21.4925504Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:32 to store for rank: 1 2023-01-11T22:54:21.4925910Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:54:21.4926299Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:32 with 2 nodes. 2023-01-11T22:54:21.4927044Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4927286Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 1 2023-01-11T22:54:21.4927519Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:33 to store for rank: 0 2023-01-11T22:54:21.4927908Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:54:21.4928303Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:33 with 2 nodes. 2023-01-11T22:54:21.4929045Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4929774Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4930509Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4931250Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4931977Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4932705Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4933697Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4934502Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4935250Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4935985Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4936715Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4937448Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4938176Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4938904Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4939631Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4940362Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4941089Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4941813Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4942618Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4943374Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4944114Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4944848Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4945576Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4946301Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4947029Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4947758Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4948480Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4949213Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4949461Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 1 2023-01-11T22:54:21.4949700Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:34 to store for rank: 0 2023-01-11T22:54:21.4950096Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:54:21.4950491Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:34 with 2 nodes. 2023-01-11T22:54:21.4951286Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4952061Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4952304Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 1 2023-01-11T22:54:21.4952540Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:35 to store for rank: 0 2023-01-11T22:54:21.4952941Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:54:21.4953331Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:35 with 2 nodes. 2023-01-11T22:54:21.4954068Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4954800Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4955020Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 1 2023-01-11T22:54:21.4955255Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:36 to store for rank: 0 2023-01-11T22:54:21.4955647Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:54:21.4956037Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:36 with 2 nodes. 2023-01-11T22:54:21.4956767Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4957501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4957742Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 0 2023-01-11T22:54:21.4957975Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:37 to store for rank: 1 2023-01-11T22:54:21.4958375Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:54:21.4958764Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:37 with 2 nodes. 2023-01-11T22:54:21.4959489Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4960283Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4960397Z dist init r=1, world=2 2023-01-11T22:54:21.4960758Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4961083Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4961395Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4961701Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4962007Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4962311Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4962609Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4962911Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4963211Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4963513Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4963810Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4964114Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.4964213Z dist init r=0, world=2 2023-01-11T22:54:21.4964539Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4964854Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4965162Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4965467Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4965773Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4966076Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4966377Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4966729Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4967027Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4967365Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4967652Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4967952Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.4968053Z ok (30.264s) 2023-01-11T22:54:21.4968397Z test_nested_always_wrap_model_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87951 2023-01-11T22:54:21.4968619Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87952 2023-01-11T22:54:21.4969011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.4969189Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.4969573Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.4969761Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.4970114Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.4970294Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.4970677Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.4970869Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.4971115Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.4971359Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.4971761Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.4972157Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.4972386Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.4972597Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.4972837Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4973216Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4974273Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.4974389Z warnings.warn( 2023-01-11T22:54:21.4975406Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.4975605Z warnings.warn( 2023-01-11T22:54:21.4975838Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4976121Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4976361Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4976592Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4976803Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4977028Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4977259Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4977486Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4977710Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4977937Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4978165Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4978395Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.4979157Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4979905Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4980648Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4981390Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4982110Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4982847Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4983575Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4984373Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4985146Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4985893Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4986629Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4987361Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4988084Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4988818Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4989548Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4990276Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4991009Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4991740Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4992460Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4993258Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4994027Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4994764Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4995494Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4996219Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4996944Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4997672Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4998399Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4999123Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.4999852Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5000579Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5000818Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5001053Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5001284Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5001608Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5001836Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5002063Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5002275Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5002546Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5002776Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5003004Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5003224Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5003449Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5003678Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5003904Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5004128Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5004337Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5004565Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5004788Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5005535Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5006277Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5007015Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5007743Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5008474Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5009203Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5009929Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5010721Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5011488Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5012228Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5013214Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5013964Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5014690Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5015424Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5016150Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5016880Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5017605Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5018334Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5019055Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5019873Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5020653Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5021397Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5022127Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5022854Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5023581Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5024307Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5025033Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5025761Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5026490Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5027215Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5027936Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5028727Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5029495Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5030232Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5030957Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5031689Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5032415Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5033146Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5033871Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5034595Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5034837Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5035073Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5035288Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5035521Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5035755Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5035984Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5036208Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5036437Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5036659Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5036963Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5037173Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5037401Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5037627Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5037892Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5038123Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5038346Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5038464Z dist init r=1, world=2 2023-01-11T22:54:21.5038575Z dist init r=0, world=2 2023-01-11T22:54:21.5038660Z ok (5.314s) 2023-01-11T22:54:21.5038997Z test_nested_always_wrap_model_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88034 2023-01-11T22:54:21.5039221Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88035 2023-01-11T22:54:21.5039612Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.5039788Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.5040171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.5040364Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.5040731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.5040888Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.5041267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.5041462Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.5041710Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.5041953Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.5042356Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.5042754Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.5042989Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.5043213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.5043436Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5043669Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5044695Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.5044813Z warnings.warn( 2023-01-11T22:54:21.5045829Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.5046007Z warnings.warn( 2023-01-11T22:54:21.5046240Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5046471Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5046745Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5046981Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5047212Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5047423Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5047651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5047882Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5048111Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5048336Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5048568Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5048796Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5049053Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5049283Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5049492Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5049724Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5049951Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5050174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5050399Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5050628Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5050855Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5051078Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5051288Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5051513Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5051742Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5051965Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5052192Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5052419Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5052648Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5053008Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5053227Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5053451Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5053675Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5053983Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5054211Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5054435Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5054659Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5054938Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5055165Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5055372Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5055598Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5055823Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5056050Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5056274Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5056499Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5056727Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5056841Z dist init r=0, world=2 2023-01-11T22:54:21.5056935Z dist init r=1, world=2 2023-01-11T22:54:21.5057034Z ok (5.715s) 2023-01-11T22:54:21.5057382Z test_nested_always_wrap_model_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88117 2023-01-11T22:54:21.5057599Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88118 2023-01-11T22:54:21.5057995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.5058177Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.5058557Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.5058751Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.5059104Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.5059281Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.5059658Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.5059852Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.5060094Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.5060340Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.5060741Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.5061137Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.5061372Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.5061581Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.5061814Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5062043Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5063066Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.5063243Z warnings.warn( 2023-01-11T22:54:21.5064304Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.5064418Z warnings.warn( 2023-01-11T22:54:21.5064658Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5064893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5065125Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5065356Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5065569Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5065798Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5066027Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5066256Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5066484Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5066709Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5066937Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5067163Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5067372Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5067598Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5067824Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5068051Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5068277Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5068504Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5068736Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5068952Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5069159Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5069382Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5069606Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5069826Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5070047Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5070271Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5070497Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5070782Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5071005Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5071211Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5071481Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5071712Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5071939Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5072162Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5072387Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5072615Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5072841Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5073048Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5073276Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5073502Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5073725Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5073945Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5074172Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5074395Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5074649Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5074872Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5074969Z dist init r=0, world=2 2023-01-11T22:54:21.5075081Z dist init r=1, world=2 2023-01-11T22:54:21.5075183Z ok (5.715s) 2023-01-11T22:54:21.5075524Z test_nested_always_wrap_model_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88200 2023-01-11T22:54:21.5075746Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88201 2023-01-11T22:54:21.5076136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.5076314Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.5076678Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.5076877Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.5077245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.5077426Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.5077810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.5078000Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.5078244Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.5078486Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.5078887Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.5079337Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.5079568Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.5079793Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.5080073Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5080311Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5081339Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.5081458Z warnings.warn( 2023-01-11T22:54:21.5082470Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.5082582Z warnings.warn( 2023-01-11T22:54:21.5082715Z File "", line 1, in 2023-01-11T22:54:21.5082930Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5083058Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5083262Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5083417Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5083632Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5083737Z self.run() 2023-01-11T22:54:21.5083942Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5084093Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5084449Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5084568Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5084932Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5085060Z getattr(self, test_name)() 2023-01-11T22:54:21.5085422Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5085527Z fn() 2023-01-11T22:54:21.5085894Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5086020Z test(self, **param_kwargs) 2023-01-11T22:54:21.5086360Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5086488Z return func(*args, **kwargs) 2023-01-11T22:54:21.5086751Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5086866Z self.run_subtests( 2023-01-11T22:54:21.5087222Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5087383Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5087751Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5087970Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5088337Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5088459Z output = model(*input) 2023-01-11T22:54:21.5088787Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5088925Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5089352Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5089532Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5089902Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5090025Z _lazy_init(state, module) 2023-01-11T22:54:21.5090379Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5090533Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5090935Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5091081Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5091425Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5091551Z return func(*args, **kwargs) 2023-01-11T22:54:21.5091932Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5092033Z p_assert( 2023-01-11T22:54:21.5092367Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5092479Z traceback.print_stack() 2023-01-11T22:54:21.5092608Z File "", line 1, in 2023-01-11T22:54:21.5092821Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5093178Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5093388Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5093542Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5093759Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5093852Z self.run() 2023-01-11T22:54:21.5094056Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5094202Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5094552Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5094688Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5095055Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5095184Z getattr(self, test_name)() 2023-01-11T22:54:21.5095546Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5095629Z fn() 2023-01-11T22:54:21.5095999Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5096126Z test(self, **param_kwargs) 2023-01-11T22:54:21.5096487Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5096613Z return func(*args, **kwargs) 2023-01-11T22:54:21.5096870Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5096983Z self.run_subtests( 2023-01-11T22:54:21.5097347Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5097593Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5097966Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5098120Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5098499Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5098679Z output = model(*input) 2023-01-11T22:54:21.5099023Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5099162Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5099543Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5099702Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5100076Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5100196Z _lazy_init(state, module) 2023-01-11T22:54:21.5100546Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5100716Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5101122Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5101268Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5101607Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5101715Z return func(*args, **kwargs) 2023-01-11T22:54:21.5102094Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5102202Z p_assert( 2023-01-11T22:54:21.5102539Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5102666Z traceback.print_stack() 2023-01-11T22:54:21.5102902Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5103137Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5103270Z File "", line 1, in 2023-01-11T22:54:21.5103465Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5103608Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5103809Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5103962Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5104176Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5104281Z self.run() 2023-01-11T22:54:21.5104482Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5104612Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5104961Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5105092Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5105456Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5105580Z getattr(self, test_name)() 2023-01-11T22:54:21.5105941Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5106040Z fn() 2023-01-11T22:54:21.5106404Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5106511Z test(self, **param_kwargs) 2023-01-11T22:54:21.5106939Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5107062Z return func(*args, **kwargs) 2023-01-11T22:54:21.5107318Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5107435Z self.run_subtests( 2023-01-11T22:54:21.5107832Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5107999Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5108369Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5108507Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5108883Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5109008Z output = model(*input) 2023-01-11T22:54:21.5109332Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5109469Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5109847Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5110023Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5110395Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5110500Z _lazy_init(state, module) 2023-01-11T22:54:21.5110855Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5111024Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5111425Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5111573Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5111909Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5112033Z return func(*args, **kwargs) 2023-01-11T22:54:21.5112412Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5112523Z p_assert( 2023-01-11T22:54:21.5112844Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5112971Z traceback.print_stack() 2023-01-11T22:54:21.5113096Z File "", line 1, in 2023-01-11T22:54:21.5113308Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5113452Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5113661Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5113815Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5114010Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5114117Z self.run() 2023-01-11T22:54:21.5114319Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5114463Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5114804Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5114937Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5115300Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5115421Z getattr(self, test_name)() 2023-01-11T22:54:21.5115762Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5115925Z fn() 2023-01-11T22:54:21.5116299Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5116419Z test(self, **param_kwargs) 2023-01-11T22:54:21.5116777Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5116901Z return func(*args, **kwargs) 2023-01-11T22:54:21.5117209Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5117325Z self.run_subtests( 2023-01-11T22:54:21.5117670Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5117833Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5118196Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5118354Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5118732Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5118855Z output = model(*input) 2023-01-11T22:54:21.5119182Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5119326Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5119687Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5119861Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5120226Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5120350Z _lazy_init(state, module) 2023-01-11T22:54:21.5120703Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5120875Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5121273Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5121419Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5121743Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5121869Z return func(*args, **kwargs) 2023-01-11T22:54:21.5122245Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5122347Z p_assert( 2023-01-11T22:54:21.5122683Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5122808Z traceback.print_stack() 2023-01-11T22:54:21.5123051Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5123287Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5123400Z File "", line 1, in 2023-01-11T22:54:21.5123610Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5123755Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5123961Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5124112Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5124326Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5124432Z self.run() 2023-01-11T22:54:21.5124544Z File "", line 1, in 2023-01-11T22:54:21.5124750Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5124962Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5125310Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5125443Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5125653Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5125794Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5126203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5126317Z getattr(self, test_name)() 2023-01-11T22:54:21.5126518Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5126669Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5127038Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5127136Z fn() 2023-01-11T22:54:21.5127355Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5127459Z self.run() 2023-01-11T22:54:21.5127829Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5127938Z test(self, **param_kwargs) 2023-01-11T22:54:21.5128136Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5128284Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5128647Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5128771Z return func(*args, **kwargs) 2023-01-11T22:54:21.5129106Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5129238Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5129479Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5129598Z self.run_subtests( 2023-01-11T22:54:21.5129959Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5130079Z getattr(self, test_name)() 2023-01-11T22:54:21.5130431Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5130596Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5130953Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5131047Z fn() 2023-01-11T22:54:21.5131396Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5131550Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5131910Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5132035Z test(self, **param_kwargs) 2023-01-11T22:54:21.5132411Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5132530Z output = model(*input) 2023-01-11T22:54:21.5133102Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5133235Z return func(*args, **kwargs) 2023-01-11T22:54:21.5133555Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5133695Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5133949Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5134061Z self.run_subtests( 2023-01-11T22:54:21.5134436Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5134697Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5135057Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5135220Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5135683Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5135799Z _lazy_init(state, module) 2023-01-11T22:54:21.5136174Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5136324Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5136675Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5136846Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5137222Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5137342Z output = model(*input) 2023-01-11T22:54:21.5137740Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5137871Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5138201Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5138338Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5138677Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5138804Z return func(*args, **kwargs) 2023-01-11T22:54:21.5139182Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5139363Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5139747Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5139834Z p_assert( 2023-01-11T22:54:21.5140206Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5140332Z _lazy_init(state, module) 2023-01-11T22:54:21.5140670Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5140796Z traceback.print_stack() 2023-01-11T22:54:21.5141146Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5141314Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5141709Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5141838Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5142175Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5142300Z return func(*args, **kwargs) 2023-01-11T22:54:21.5142680Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5142782Z p_assert( 2023-01-11T22:54:21.5143118Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5143246Z traceback.print_stack() 2023-01-11T22:54:21.5143485Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5143705Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5143902Z File "", line 1, in 2023-01-11T22:54:21.5144116Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5144261Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5144463Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5144615Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5144873Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5144967Z self.run() 2023-01-11T22:54:21.5145173Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5145323Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5145669Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5145802Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5146166Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5146295Z getattr(self, test_name)() 2023-01-11T22:54:21.5146656Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5146738Z fn() 2023-01-11T22:54:21.5147104Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5147231Z test(self, **param_kwargs) 2023-01-11T22:54:21.5147586Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5147710Z return func(*args, **kwargs) 2023-01-11T22:54:21.5147964Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5148076Z self.run_subtests( 2023-01-11T22:54:21.5148435Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5148585Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5148950Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5149105Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5149485Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5149606Z output = model(*input) 2023-01-11T22:54:21.5149935Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5150078Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5150456Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5150616Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5150989Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5151115Z _lazy_init(state, module) 2023-01-11T22:54:21.5151468Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5151635Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5152034Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5152178Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5152517Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5152625Z return func(*args, **kwargs) 2023-01-11T22:54:21.5153003Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5153168Z p_assert( 2023-01-11T22:54:21.5153511Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5153636Z traceback.print_stack() 2023-01-11T22:54:21.5153767Z File "", line 1, in 2023-01-11T22:54:21.5153975Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5154170Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5154366Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5154521Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5154736Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5154841Z self.run() 2023-01-11T22:54:21.5155042Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5155190Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5155542Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5155660Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5156022Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5156146Z getattr(self, test_name)() 2023-01-11T22:54:21.5156505Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5156600Z fn() 2023-01-11T22:54:21.5156965Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5157089Z test(self, **param_kwargs) 2023-01-11T22:54:21.5157445Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5157552Z return func(*args, **kwargs) 2023-01-11T22:54:21.5157810Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5157923Z self.run_subtests( 2023-01-11T22:54:21.5158275Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5158438Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5158802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5158953Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5159328Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5159430Z output = model(*input) 2023-01-11T22:54:21.5159758Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5159895Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5160273Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5160451Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5160815Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5160936Z _lazy_init(state, module) 2023-01-11T22:54:21.5161291Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5161459Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5161838Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5161980Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5162318Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5162518Z return func(*args, **kwargs) 2023-01-11T22:54:21.5162900Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5163004Z p_assert( 2023-01-11T22:54:21.5163344Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5163521Z traceback.print_stack() 2023-01-11T22:54:21.5163751Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5163984Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5164114Z File "", line 1, in 2023-01-11T22:54:21.5164321Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5164463Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5164673Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5164826Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5165020Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5165125Z self.run() 2023-01-11T22:54:21.5165253Z File "", line 1, in 2023-01-11T22:54:21.5165462Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5165605Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5165956Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5166087Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5166295Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5166419Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5166783Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5166910Z getattr(self, test_name)() 2023-01-11T22:54:21.5167113Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5167265Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5167626Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5167725Z fn() 2023-01-11T22:54:21.5167920Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5168022Z self.run() 2023-01-11T22:54:21.5168390Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5168514Z test(self, **param_kwargs) 2023-01-11T22:54:21.5168716Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5168864Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5169225Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5169349Z return func(*args, **kwargs) 2023-01-11T22:54:21.5169667Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5169806Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5170067Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5170181Z self.run_subtests( 2023-01-11T22:54:21.5170547Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5170669Z getattr(self, test_name)() 2023-01-11T22:54:21.5171020Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5171248Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5171594Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5171696Z fn() 2023-01-11T22:54:21.5172062Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5172214Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5172621Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5172754Z test(self, **param_kwargs) 2023-01-11T22:54:21.5173279Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5173398Z output = model(*input) 2023-01-11T22:54:21.5173745Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5173878Z return func(*args, **kwargs) 2023-01-11T22:54:21.5174202Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5174342Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5174600Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5174716Z self.run_subtests( 2023-01-11T22:54:21.5175114Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5175290Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5175628Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5175792Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5176162Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5176287Z _lazy_init(state, module) 2023-01-11T22:54:21.5176649Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5176802Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5177155Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5177324Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5177684Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5177806Z output = model(*input) 2023-01-11T22:54:21.5178205Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5178347Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5178677Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5178816Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5179150Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5179274Z return func(*args, **kwargs) 2023-01-11T22:54:21.5179639Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5179815Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5180193Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5180293Z p_assert( 2023-01-11T22:54:21.5180660Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5180867Z _lazy_init(state, module) 2023-01-11T22:54:21.5181204Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5181331Z traceback.print_stack() 2023-01-11T22:54:21.5181665Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5181834Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5182294Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5182447Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5182791Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5182920Z return func(*args, **kwargs) 2023-01-11T22:54:21.5183295Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5183399Z p_assert( 2023-01-11T22:54:21.5183721Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5183846Z traceback.print_stack() 2023-01-11T22:54:21.5184081Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5184318Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5184449Z File "", line 1, in 2023-01-11T22:54:21.5184656Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5184804Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5185005Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5185141Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5185353Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5185460Z self.run() 2023-01-11T22:54:21.5185665Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5185811Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5186155Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5186290Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5186653Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5186761Z getattr(self, test_name)() 2023-01-11T22:54:21.5187119Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5187219Z fn() 2023-01-11T22:54:21.5187584Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5187708Z test(self, **param_kwargs) 2023-01-11T22:54:21.5188065Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5188190Z return func(*args, **kwargs) 2023-01-11T22:54:21.5188446Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5188544Z self.run_subtests( 2023-01-11T22:54:21.5188902Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5189065Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5189426Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5189579Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5189955Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5190141Z output = model(*input) 2023-01-11T22:54:21.5190476Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5190600Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5190974Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5191196Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5191573Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5191693Z _lazy_init(state, module) 2023-01-11T22:54:21.5192046Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5192214Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5192616Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5192741Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5193079Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5193206Z return func(*args, **kwargs) 2023-01-11T22:54:21.5193585Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5193691Z p_assert( 2023-01-11T22:54:21.5194026Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5194152Z traceback.print_stack() 2023-01-11T22:54:21.5194281Z File "", line 1, in 2023-01-11T22:54:21.5194476Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5194621Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5194829Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5194982Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5195197Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5195306Z self.run() 2023-01-11T22:54:21.5195511Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5195643Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5195982Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5196114Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5196474Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5196596Z getattr(self, test_name)() 2023-01-11T22:54:21.5196958Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5197061Z fn() 2023-01-11T22:54:21.5197429Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5197535Z test(self, **param_kwargs) 2023-01-11T22:54:21.5197888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5198015Z return func(*args, **kwargs) 2023-01-11T22:54:21.5198275Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5198390Z self.run_subtests( 2023-01-11T22:54:21.5198746Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5198905Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5199269Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5199467Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5199849Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5199968Z output = model(*input) 2023-01-11T22:54:21.5200340Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5200483Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5200868Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5201041Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5201409Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5201513Z _lazy_init(state, module) 2023-01-11T22:54:21.5201874Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5202044Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5202438Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5202581Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5202917Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5203042Z return func(*args, **kwargs) 2023-01-11T22:54:21.5203419Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5203506Z p_assert( 2023-01-11T22:54:21.5203847Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5203978Z traceback.print_stack() 2023-01-11T22:54:21.5204215Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5204450Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5204579Z File "", line 1, in 2023-01-11T22:54:21.5204790Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5204933Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5205119Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5205270Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5205486Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5205588Z self.run() 2023-01-11T22:54:21.5205793Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5205947Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5206291Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5206423Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5206768Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5206888Z getattr(self, test_name)() 2023-01-11T22:54:21.5207249Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5207349Z fn() 2023-01-11T22:54:21.5207715Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5207838Z test(self, **param_kwargs) 2023-01-11T22:54:21.5208194Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5208322Z return func(*args, **kwargs) 2023-01-11T22:54:21.5208629Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5208742Z self.run_subtests( 2023-01-11T22:54:21.5209102Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5209266Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5209671Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5209830Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5210211Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5210329Z output = model(*input) 2023-01-11T22:54:21.5210640Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5210781Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5211158Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5211332Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5211705Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5211831Z _lazy_init(state, module) 2023-01-11T22:54:21.5212184Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5212353Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5212731Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5213014Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5213369Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5213495Z return func(*args, **kwargs) 2023-01-11T22:54:21.5213873Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5213979Z p_assert( 2023-01-11T22:54:21.5214319Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5214449Z traceback.print_stack() 2023-01-11T22:54:21.5214564Z File "", line 1, in 2023-01-11T22:54:21.5214778Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5214920Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5215122Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5215272Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5215490Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5215596Z self.run() 2023-01-11T22:54:21.5215783Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5215928Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5216268Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5216403Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5216766Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5216888Z getattr(self, test_name)() 2023-01-11T22:54:21.5217246Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5217342Z fn() 2023-01-11T22:54:21.5217689Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5217899Z test(self, **param_kwargs) 2023-01-11T22:54:21.5218262Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5218387Z return func(*args, **kwargs) 2023-01-11T22:54:21.5218644Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5218758Z self.run_subtests( 2023-01-11T22:54:21.5219174Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5219348Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5219703Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5219854Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5220231Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5220352Z output = model(*input) 2023-01-11T22:54:21.5220680Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5220815Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5221187Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5221363Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5221713Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5221838Z _lazy_init(state, module) 2023-01-11T22:54:21.5222190Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5222354Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5222752Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5222896Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5223235Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5223361Z return func(*args, **kwargs) 2023-01-11T22:54:21.5223721Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5223826Z p_assert( 2023-01-11T22:54:21.5224159Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5224284Z traceback.print_stack() 2023-01-11T22:54:21.5224522Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5224758Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5224892Z File "", line 1, in 2023-01-11T22:54:21.5225100Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5225226Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5225424Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5232882Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5233132Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5233241Z self.run() 2023-01-11T22:54:21.5233448Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5233595Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5233979Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5234113Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5234605Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5234732Z getattr(self, test_name)() 2023-01-11T22:54:21.5235098Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5235197Z fn() 2023-01-11T22:54:21.5235624Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5235751Z test(self, **param_kwargs) 2023-01-11T22:54:21.5236114Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5236222Z return func(*args, **kwargs) 2023-01-11T22:54:21.5236484Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5236601Z self.run_subtests( 2023-01-11T22:54:21.5236960Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5237122Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5237490Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5237643Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5238020Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5238139Z output = model(*input) 2023-01-11T22:54:21.5238452Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5238588Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5238966Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5239143Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5239510Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5239630Z _lazy_init(state, module) 2023-01-11T22:54:21.5239985Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5240155Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5240538Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5240685Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5241021Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5241148Z return func(*args, **kwargs) 2023-01-11T22:54:21.5241528Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5241632Z p_assert( 2023-01-11T22:54:21.5241969Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5242096Z traceback.print_stack() 2023-01-11T22:54:21.5242210Z File "", line 1, in 2023-01-11T22:54:21.5242418Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5242566Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5242765Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5242916Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5243129Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5243232Z self.run() 2023-01-11T22:54:21.5243417Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5243629Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5243975Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5244107Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5244471Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5244592Z getattr(self, test_name)() 2023-01-11T22:54:21.5244997Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5245100Z fn() 2023-01-11T22:54:21.5245454Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5245578Z test(self, **param_kwargs) 2023-01-11T22:54:21.5245934Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5246063Z return func(*args, **kwargs) 2023-01-11T22:54:21.5246318Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5246426Z self.run_subtests( 2023-01-11T22:54:21.5246780Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5246938Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5247289Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5247440Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5247814Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5247930Z output = model(*input) 2023-01-11T22:54:21.5248255Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5248391Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5248765Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5248940Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5249290Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5249412Z _lazy_init(state, module) 2023-01-11T22:54:21.5249765Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5249931Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5250329Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5250471Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5250811Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5250934Z return func(*args, **kwargs) 2023-01-11T22:54:21.5251291Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5251395Z p_assert( 2023-01-11T22:54:21.5251733Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5251859Z traceback.print_stack() 2023-01-11T22:54:21.5252095Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5252331Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5252459Z File "", line 1, in 2023-01-11T22:54:21.5252669Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5253086Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5253310Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5253463Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5253676Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5253778Z self.run() 2023-01-11T22:54:21.5254071Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5254222Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5254578Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5254695Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5255058Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5255181Z getattr(self, test_name)() 2023-01-11T22:54:21.5255550Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5255648Z fn() 2023-01-11T22:54:21.5256014Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5256138Z test(self, **param_kwargs) 2023-01-11T22:54:21.5256480Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5256606Z return func(*args, **kwargs) 2023-01-11T22:54:21.5256863Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5256974Z self.run_subtests( 2023-01-11T22:54:21.5257325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5257486Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5257856Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5258008Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5258368Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5258489Z output = model(*input) 2023-01-11T22:54:21.5258819Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5258955Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5259329Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5259503Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5259629Z File "", line 1, in 2023-01-11T22:54:21.5259996Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5260104Z _lazy_init(state, module) 2023-01-11T22:54:21.5260458Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5260623Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5260833Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5260977Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5261371Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5261513Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5261712Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5261846Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5262181Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5262386Z return func(*args, **kwargs) 2023-01-11T22:54:21.5262599Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5262699Z self.run() 2023-01-11T22:54:21.5263085Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5263188Z p_assert( 2023-01-11T22:54:21.5263437Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5263572Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5263913Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5264039Z traceback.print_stack() 2023-01-11T22:54:21.5264372Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5264506Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5264868Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5264989Z getattr(self, test_name)() 2023-01-11T22:54:21.5265348Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5265430Z fn() 2023-01-11T22:54:21.5265797Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5265919Z test(self, **param_kwargs) 2023-01-11T22:54:21.5266274Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5266395Z return func(*args, **kwargs) 2023-01-11T22:54:21.5266652Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5266767Z self.run_subtests( 2023-01-11T22:54:21.5267122Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5267268Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5267633Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5267788Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5268163Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5268281Z output = model(*input) 2023-01-11T22:54:21.5268603Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5268741Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5269115Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5269276Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5269646Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5269766Z _lazy_init(state, module) 2023-01-11T22:54:21.5270116Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5270283Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5270679Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5270820Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5271157Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5271267Z return func(*args, **kwargs) 2023-01-11T22:54:21.5271712Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5271815Z p_assert( 2023-01-11T22:54:21.5272153Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5272280Z traceback.print_stack() 2023-01-11T22:54:21.5272513Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5272798Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5272934Z File "", line 1, in 2023-01-11T22:54:21.5273130Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5273272Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5273476Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5273627Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5273845Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5273947Z self.run() 2023-01-11T22:54:21.5274145Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5274275Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5274624Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5274760Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5275121Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5275245Z getattr(self, test_name)() 2023-01-11T22:54:21.5275635Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5275733Z fn() 2023-01-11T22:54:21.5276097Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5276208Z test(self, **param_kwargs) 2023-01-11T22:54:21.5276564Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5276687Z return func(*args, **kwargs) 2023-01-11T22:54:21.5276942Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5277059Z self.run_subtests( 2023-01-11T22:54:21.5277414Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5277578Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5277944Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5278081Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5278456Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5278578Z output = model(*input) 2023-01-11T22:54:21.5278908Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5279046Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5279426Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5279599Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5279969Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5280075Z _lazy_init(state, module) 2023-01-11T22:54:21.5280429Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5280595Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5281062Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5281206Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5281544Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5281668Z return func(*args, **kwargs) 2023-01-11T22:54:21.5282089Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5282197Z p_assert( 2023-01-11T22:54:21.5282526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5282650Z traceback.print_stack() 2023-01-11T22:54:21.5282779Z File "", line 1, in 2023-01-11T22:54:21.5282987Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5283137Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5283341Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5283491Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5283687Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5283787Z self.run() 2023-01-11T22:54:21.5283992Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5284137Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5284483Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5284615Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5284976Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5285099Z getattr(self, test_name)() 2023-01-11T22:54:21.5285447Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5285546Z fn() 2023-01-11T22:54:21.5285912Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5286031Z test(self, **param_kwargs) 2023-01-11T22:54:21.5286391Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5286512Z return func(*args, **kwargs) 2023-01-11T22:54:21.5286764Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5286878Z self.run_subtests( 2023-01-11T22:54:21.5287219Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5287379Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5287748Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5287899Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5288267Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5288384Z output = model(*input) 2023-01-11T22:54:21.5288712Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5288851Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5289216Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5289387Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5289756Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5289978Z _lazy_init(state, module) 2023-01-11T22:54:21.5290335Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5290504Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5290898Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5291089Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5291426Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5291551Z return func(*args, **kwargs) 2023-01-11T22:54:21.5291926Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5292028Z p_assert( 2023-01-11T22:54:21.5292364Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5292493Z traceback.print_stack() 2023-01-11T22:54:21.5292729Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5293178Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5293299Z File "", line 1, in 2023-01-11T22:54:21.5293514Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5293657Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5293861Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5294015Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5294229Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5294330Z self.run() 2023-01-11T22:54:21.5294515Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5294665Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5295014Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5295147Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5295508Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5295634Z getattr(self, test_name)() 2023-01-11T22:54:21.5295758Z File "", line 1, in 2023-01-11T22:54:21.5296117Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5296199Z fn() 2023-01-11T22:54:21.5296564Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5296683Z test(self, **param_kwargs) 2023-01-11T22:54:21.5296891Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5297034Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5297393Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5297513Z return func(*args, **kwargs) 2023-01-11T22:54:21.5297707Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5297846Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5298103Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5298216Z self.run_subtests( 2023-01-11T22:54:21.5298426Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5298529Z self.run() 2023-01-11T22:54:21.5298878Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5299143Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5299328Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5299477Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5299848Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5300001Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5300397Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5300537Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5300918Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5301034Z output = model(*input) 2023-01-11T22:54:21.5301377Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5301508Z getattr(self, test_name)() 2023-01-11T22:54:21.5301837Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5301973Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5302329Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5302426Z fn() 2023-01-11T22:54:21.5302804Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5302976Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5303325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5303449Z test(self, **param_kwargs) 2023-01-11T22:54:21.5303812Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5303936Z _lazy_init(state, module) 2023-01-11T22:54:21.5304290Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5304414Z return func(*args, **kwargs) 2023-01-11T22:54:21.5304767Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5304937Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5305191Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5305289Z self.run_subtests( 2023-01-11T22:54:21.5305686Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5305831Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5306180Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5306340Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5306675Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5306801Z return func(*args, **kwargs) 2023-01-11T22:54:21.5307164Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5307301Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5307678Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5307780Z p_assert( 2023-01-11T22:54:21.5308156Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5308276Z output = model(*input) 2023-01-11T22:54:21.5308681Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5308809Z traceback.print_stack() 2023-01-11T22:54:21.5309135Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5309256Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5309681Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5309862Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5310235Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5310353Z _lazy_init(state, module) 2023-01-11T22:54:21.5310702Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5310872Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5311267Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5311393Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5311731Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5311860Z return func(*args, **kwargs) 2023-01-11T22:54:21.5312233Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5312336Z p_assert( 2023-01-11T22:54:21.5312668Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5312793Z traceback.print_stack() 2023-01-11T22:54:21.5313025Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5313248Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5313380Z File "", line 1, in 2023-01-11T22:54:21.5313588Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5313729Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5313927Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5314080Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5314292Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5314380Z self.run() 2023-01-11T22:54:21.5314575Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5314720Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5315064Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5315199Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5315559Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5315679Z getattr(self, test_name)() 2023-01-11T22:54:21.5316034Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5316116Z fn() 2023-01-11T22:54:21.5316481Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5316602Z test(self, **param_kwargs) 2023-01-11T22:54:21.5316960Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5317084Z return func(*args, **kwargs) 2023-01-11T22:54:21.5317339Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5317514Z self.run_subtests( 2023-01-11T22:54:21.5317877Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5318023Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5318384Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5318536Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5318951Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5319077Z output = model(*input) 2023-01-11T22:54:21.5319409Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5319545Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5319919Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5320082Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5320451Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5320572Z _lazy_init(state, module) 2023-01-11T22:54:21.5320925Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5321096Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5321492Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5321634Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5321971Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5322080Z return func(*args, **kwargs) 2023-01-11T22:54:21.5322460Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5322563Z p_assert( 2023-01-11T22:54:21.5322899Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5323022Z traceback.print_stack() 2023-01-11T22:54:21.5323152Z File "", line 1, in 2023-01-11T22:54:21.5323362Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5323507Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5323692Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5323842Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5324058Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5324159Z self.run() 2023-01-11T22:54:21.5324361Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5324512Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5324854Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5324969Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5325330Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5325456Z getattr(self, test_name)() 2023-01-11T22:54:21.5325813Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5325910Z fn() 2023-01-11T22:54:21.5326274Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5326395Z test(self, **param_kwargs) 2023-01-11T22:54:21.5326749Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5326917Z return func(*args, **kwargs) 2023-01-11T22:54:21.5327173Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5327288Z self.run_subtests( 2023-01-11T22:54:21.5327649Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5327854Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5328228Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5328379Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5328751Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5328855Z output = model(*input) 2023-01-11T22:54:21.5329182Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5329318Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5329696Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5329870Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5330236Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5330353Z _lazy_init(state, module) 2023-01-11T22:54:21.5330708Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5330875Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5331254Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5331397Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5331734Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5331855Z return func(*args, **kwargs) 2023-01-11T22:54:21.5332230Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5332330Z p_assert( 2023-01-11T22:54:21.5332668Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5332792Z traceback.print_stack() 2023-01-11T22:54:21.5333160Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5333402Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5333634Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5333874Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5334101Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5334329Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5334556Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5334785Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5334995Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5335220Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5335443Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5335666Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5336524Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5337325Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5338082Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5338825Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5339556Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5340291Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5341029Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5341762Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5342491Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5343222Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5343954Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5344676Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5345473Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5346245Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5346980Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5347707Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5348437Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5349162Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5349924Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5350660Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5350897Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5351130Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5351363Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5351583Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5351809Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5352034Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5352261Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5352492Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5352710Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5352934Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5353157Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5353366Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5353657Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5353880Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5354104Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5354327Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5354593Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5354823Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5355578Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5356313Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5357046Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5357777Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5358502Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5359233Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5359956Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5360685Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5361409Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5362136Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5362945Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5363715Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5364448Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5365178Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5365902Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5366625Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5367333Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5368064Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5368791Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5369518Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5370232Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5370961Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5371753Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5372529Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5373407Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5374146Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5374873Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5375597Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5376347Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5377076Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5377799Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5378529Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5379258Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5379979Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5380794Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5381579Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5382321Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5383051Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5383775Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5384500Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5384736Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5384968Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5385195Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5385426Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5385652Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5385881Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5386090Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5386316Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5386540Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5386770Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5386995Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5387219Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5387443Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5387668Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5387890Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5388100Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5388315Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5388535Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5388710Z dist init r=0, world=2 2023-01-11T22:54:21.5389044Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5389362Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5389707Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5390019Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5390324Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5390627Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5390912Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5391214Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5391511Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5391813Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5392120Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5392420Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5392531Z dist init r=1, world=2 2023-01-11T22:54:21.5392858Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5393171Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5393476Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5393778Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5394067Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5394371Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5394671Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5394971Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5395273Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5395629Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5395931Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5396272Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5396376Z ok (5.715s) 2023-01-11T22:54:21.5396709Z test_nested_always_wrap_model_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88283 2023-01-11T22:54:21.5396924Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88284 2023-01-11T22:54:21.5397304Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.5397478Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.5397865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.5398055Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.5398425Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.5398601Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.5398977Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.5399165Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.5399392Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.5399640Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.5400042Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.5400437Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.5400667Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.5400895Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.5401129Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5401358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5402378Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.5402498Z warnings.warn( 2023-01-11T22:54:21.5403513Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.5403624Z warnings.warn( 2023-01-11T22:54:21.5403800Z File "", line 1, in 2023-01-11T22:54:21.5403926Z File "", line 1, in 2023-01-11T22:54:21.5404142Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5404282Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5404496Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5404680Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5404886Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5405020Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5405222Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5405366Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5405579Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5405686Z self.run() 2023-01-11T22:54:21.5405889Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5405986Z self.run() 2023-01-11T22:54:21.5406185Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5406317Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5406518Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5406666Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5407017Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5407149Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5407482Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5407609Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5407955Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5408083Z getattr(self, test_name)() 2023-01-11T22:54:21.5408446Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5408569Z getattr(self, test_name)() 2023-01-11T22:54:21.5408927Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5409027Z fn() 2023-01-11T22:54:21.5409384Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5409478Z fn() 2023-01-11T22:54:21.5409828Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5409949Z test(self, **param_kwargs) 2023-01-11T22:54:21.5410305Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5410427Z test(self, **param_kwargs) 2023-01-11T22:54:21.5410780Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5410895Z return func(*args, **kwargs) 2023-01-11T22:54:21.5411250Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5411374Z return func(*args, **kwargs) 2023-01-11T22:54:21.5411616Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5411728Z self.run_subtests( 2023-01-11T22:54:21.5411983Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5412093Z self.run_subtests( 2023-01-11T22:54:21.5412449Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5412682Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5413222Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5413388Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5413742Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5413969Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5414343Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5414492Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5414868Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5414985Z output = model(*input) 2023-01-11T22:54:21.5415365Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5415481Z output = model(*input) 2023-01-11T22:54:21.5415793Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5415932Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5416260Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5416396Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5416776Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5416953Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5417331Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5417508Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5417860Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5417980Z _lazy_init(state, module) 2023-01-11T22:54:21.5418345Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5418461Z _lazy_init(state, module) 2023-01-11T22:54:21.5418816Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5418982Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5419334Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5419499Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5419900Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5420032Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5420438Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5420581Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5420925Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5421051Z return func(*args, **kwargs) 2023-01-11T22:54:21.5421386Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5421511Z return func(*args, **kwargs) 2023-01-11T22:54:21.5421885Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5421973Z p_assert( 2023-01-11T22:54:21.5422458Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5422562Z p_assert( 2023-01-11T22:54:21.5422899Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5423026Z traceback.print_stack() 2023-01-11T22:54:21.5423361Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5423569Z traceback.print_stack() 2023-01-11T22:54:21.5423815Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5424038Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5424166Z File "", line 1, in 2023-01-11T22:54:21.5424372Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5424516Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5424717Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5424871Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5425083Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5425171Z self.run() 2023-01-11T22:54:21.5425375Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5425522Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5425870Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5426004Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5426363Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5426487Z getattr(self, test_name)() 2023-01-11T22:54:21.5426846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5426933Z fn() 2023-01-11T22:54:21.5427298Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5427417Z test(self, **param_kwargs) 2023-01-11T22:54:21.5427777Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5427905Z return func(*args, **kwargs) 2023-01-11T22:54:21.5428162Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5428275Z self.run_subtests( 2023-01-11T22:54:21.5428633Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5428780Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5429142Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5429296Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5429676Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5429795Z output = model(*input) 2023-01-11T22:54:21.5430122Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5430261Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5430636Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5430796Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5431165Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5431288Z _lazy_init(state, module) 2023-01-11T22:54:21.5431720Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5431889Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5432284Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5432427Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5432846Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5432966Z return func(*args, **kwargs) 2023-01-11T22:54:21.5433353Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5433458Z p_assert( 2023-01-11T22:54:21.5433790Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5433921Z traceback.print_stack() 2023-01-11T22:54:21.5434048Z File "", line 1, in 2023-01-11T22:54:21.5434257Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5434401Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5434586Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5434738Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5434951Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5435054Z self.run() 2023-01-11T22:54:21.5435257Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5435402Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5435744Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5435877Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5436226Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5436352Z getattr(self, test_name)() 2023-01-11T22:54:21.5436709Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5436803Z fn() 2023-01-11T22:54:21.5437171Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5437294Z test(self, **param_kwargs) 2023-01-11T22:54:21.5437651Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5437760Z return func(*args, **kwargs) 2023-01-11T22:54:21.5438016Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5438128Z self.run_subtests( 2023-01-11T22:54:21.5438491Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5438650Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5439010Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5439163Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5439540Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5439658Z output = model(*input) 2023-01-11T22:54:21.5439967Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5440104Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5440483Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5440722Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5441097Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5441219Z _lazy_init(state, module) 2023-01-11T22:54:21.5441571Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5441786Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5442183Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5442325Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5442667Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5442791Z return func(*args, **kwargs) 2023-01-11T22:54:21.5443170Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5443279Z p_assert( 2023-01-11T22:54:21.5443615Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5443740Z traceback.print_stack() 2023-01-11T22:54:21.5443962Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5444202Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5444334Z File "", line 1, in 2023-01-11T22:54:21.5444470Z File "", line 1, in 2023-01-11T22:54:21.5444681Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5444824Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5445026Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5445163Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5445377Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5445516Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5445730Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5445832Z self.run() 2023-01-11T22:54:21.5446030Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5446184Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5446386Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5446518Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5446730Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5446827Z self.run() 2023-01-11T22:54:21.5447025Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5447172Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5447521Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5447656Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5448002Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5448125Z getattr(self, test_name)() 2023-01-11T22:54:21.5448462Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5448594Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5448948Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5449044Z fn() 2023-01-11T22:54:21.5449404Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5449582Z getattr(self, test_name)() 2023-01-11T22:54:21.5449938Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5450061Z test(self, **param_kwargs) 2023-01-11T22:54:21.5450418Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5450512Z fn() 2023-01-11T22:54:21.5450920Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5451051Z return func(*args, **kwargs) 2023-01-11T22:54:21.5451416Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5451532Z test(self, **param_kwargs) 2023-01-11T22:54:21.5451770Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5451886Z self.run_subtests( 2023-01-11T22:54:21.5452247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5452370Z return func(*args, **kwargs) 2023-01-11T22:54:21.5452722Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5453086Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5453357Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5453471Z self.run_subtests( 2023-01-11T22:54:21.5453823Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5453974Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5454324Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5454489Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5454863Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5454981Z output = model(*input) 2023-01-11T22:54:21.5455346Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5455501Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5455810Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5455950Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5456321Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5456440Z output = model(*input) 2023-01-11T22:54:21.5456817Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5456997Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5457322Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5457460Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5457811Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5457937Z _lazy_init(state, module) 2023-01-11T22:54:21.5458313Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5458488Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5458841Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5459010Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5459468Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5459591Z _lazy_init(state, module) 2023-01-11T22:54:21.5459984Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5460111Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5460519Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5460697Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5461044Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5461169Z return func(*args, **kwargs) 2023-01-11T22:54:21.5461567Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5461714Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5462091Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5462179Z p_assert( 2023-01-11T22:54:21.5462517Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5462642Z return func(*args, **kwargs) 2023-01-11T22:54:21.5462980Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5463108Z traceback.print_stack() 2023-01-11T22:54:21.5463487Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5463585Z p_assert( 2023-01-11T22:54:21.5463913Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5464026Z traceback.print_stack() 2023-01-11T22:54:21.5464265Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5464501Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5464632Z File "", line 1, in 2023-01-11T22:54:21.5464838Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5464983Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5465182Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5465317Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5465529Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5465631Z self.run() 2023-01-11T22:54:21.5465832Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5465979Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5466325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5466459Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5466822Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5466928Z getattr(self, test_name)() 2023-01-11T22:54:21.5467290Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5467384Z fn() 2023-01-11T22:54:21.5467752Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5467876Z test(self, **param_kwargs) 2023-01-11T22:54:21.5468233Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5468421Z return func(*args, **kwargs) 2023-01-11T22:54:21.5468677Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5468775Z self.run_subtests( 2023-01-11T22:54:21.5469138Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5469297Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5469713Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5469873Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5470253Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5470372Z output = model(*input) 2023-01-11T22:54:21.5470696Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5470820Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5471198Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5471373Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5471739Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5471859Z _lazy_init(state, module) 2023-01-11T22:54:21.5472212Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5472378Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5472774Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5472900Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5473246Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5473364Z return func(*args, **kwargs) 2023-01-11T22:54:21.5473738Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5473840Z p_assert( 2023-01-11T22:54:21.5474181Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5474308Z traceback.print_stack() 2023-01-11T22:54:21.5474437Z File "", line 1, in 2023-01-11T22:54:21.5474629Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5474769Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5474965Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5475115Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5475330Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5475429Z self.run() 2023-01-11T22:54:21.5475629Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5475758Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5476099Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5476235Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5476624Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5476748Z getattr(self, test_name)() 2023-01-11T22:54:21.5477108Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5477207Z fn() 2023-01-11T22:54:21.5477571Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5477739Z test(self, **param_kwargs) 2023-01-11T22:54:21.5478108Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5478235Z return func(*args, **kwargs) 2023-01-11T22:54:21.5478494Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5478655Z self.run_subtests( 2023-01-11T22:54:21.5479020Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5479183Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5479549Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5479686Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5480059Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5480183Z output = model(*input) 2023-01-11T22:54:21.5480508Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5480646Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5481021Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5481193Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5481559Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5481662Z _lazy_init(state, module) 2023-01-11T22:54:21.5482016Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5482183Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5482579Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5482720Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5483059Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5483182Z return func(*args, **kwargs) 2023-01-11T22:54:21.5483561Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5483665Z p_assert( 2023-01-11T22:54:21.5483987Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5484114Z traceback.print_stack() 2023-01-11T22:54:21.5484353Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5484588Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5484724Z File "", line 1, in 2023-01-11T22:54:21.5484935Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5485081Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5485266Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5485417Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5485551Z File "", line 1, in 2023-01-11T22:54:21.5485765Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5485869Z self.run() 2023-01-11T22:54:21.5486072Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5486216Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5486424Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5486640Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5486989Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5487126Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5487327Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5487478Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5487892Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5488023Z getattr(self, test_name)() 2023-01-11T22:54:21.5488219Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5488322Z self.run() 2023-01-11T22:54:21.5488691Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5488788Z fn() 2023-01-11T22:54:21.5488997Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5489144Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5489512Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5489632Z test(self, **param_kwargs) 2023-01-11T22:54:21.5489950Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5490079Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5490438Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5490561Z return func(*args, **kwargs) 2023-01-11T22:54:21.5490921Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5491044Z getattr(self, test_name)() 2023-01-11T22:54:21.5491307Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5491419Z self.run_subtests( 2023-01-11T22:54:21.5491756Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5491854Z fn() 2023-01-11T22:54:21.5492209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5492372Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5492738Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5493024Z test(self, **param_kwargs) 2023-01-11T22:54:21.5493393Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5493548Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5493894Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5494018Z return func(*args, **kwargs) 2023-01-11T22:54:21.5494393Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5494514Z output = model(*input) 2023-01-11T22:54:21.5494774Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5494884Z self.run_subtests( 2023-01-11T22:54:21.5495206Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5495345Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5495679Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5495841Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5496316Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5496492Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5496857Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5497009Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5497435Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5497565Z _lazy_init(state, module) 2023-01-11T22:54:21.5497932Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5498054Z output = model(*input) 2023-01-11T22:54:21.5498407Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5498578Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5498900Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5499035Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5499424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5499569Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5499933Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5500100Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5500437Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5500566Z return func(*args, **kwargs) 2023-01-11T22:54:21.5500936Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5501056Z _lazy_init(state, module) 2023-01-11T22:54:21.5501432Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5501534Z p_assert( 2023-01-11T22:54:21.5501871Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5502042Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5502375Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5502502Z traceback.print_stack() 2023-01-11T22:54:21.5502900Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5503040Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5503379Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5503499Z return func(*args, **kwargs) 2023-01-11T22:54:21.5503859Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5503960Z p_assert( 2023-01-11T22:54:21.5504295Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5504423Z traceback.print_stack() 2023-01-11T22:54:21.5504658Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5504894Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5505023Z File "", line 1, in 2023-01-11T22:54:21.5505233Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5505421Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5505623Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5505777Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5505992Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5506095Z self.run() 2023-01-11T22:54:21.5506339Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5506487Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5506829Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5506945Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5507307Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5507431Z getattr(self, test_name)() 2023-01-11T22:54:21.5507800Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5507899Z fn() 2023-01-11T22:54:21.5508264Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5508388Z test(self, **param_kwargs) 2023-01-11T22:54:21.5508745Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5508854Z return func(*args, **kwargs) 2023-01-11T22:54:21.5509110Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5509224Z self.run_subtests( 2023-01-11T22:54:21.5509574Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5509733Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5510102Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5510253Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5510628Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5510731Z output = model(*input) 2023-01-11T22:54:21.5511056Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5511193Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5511570Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5511742Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5512109Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5512231Z _lazy_init(state, module) 2023-01-11T22:54:21.5512579Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5512732Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5513126Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5513269Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5513610Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5513736Z return func(*args, **kwargs) 2023-01-11T22:54:21.5514111Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5514211Z p_assert( 2023-01-11T22:54:21.5514549Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5514720Z traceback.print_stack() 2023-01-11T22:54:21.5514848Z File "", line 1, in 2023-01-11T22:54:21.5515055Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5515201Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5515404Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5515606Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5515822Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5515909Z self.run() 2023-01-11T22:54:21.5516111Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5516257Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5516602Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5516738Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5517099Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5517223Z getattr(self, test_name)() 2023-01-11T22:54:21.5517585Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5517667Z fn() 2023-01-11T22:54:21.5518033Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5518156Z test(self, **param_kwargs) 2023-01-11T22:54:21.5518512Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5518639Z return func(*args, **kwargs) 2023-01-11T22:54:21.5518899Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5519014Z self.run_subtests( 2023-01-11T22:54:21.5519368Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5519514Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5519878Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5520031Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5520410Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5520530Z output = model(*input) 2023-01-11T22:54:21.5520857Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5520993Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5521369Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5521532Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5521898Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5522018Z _lazy_init(state, module) 2023-01-11T22:54:21.5522372Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5522543Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5522939Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5523080Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5523418Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5523527Z return func(*args, **kwargs) 2023-01-11T22:54:21.5523973Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5524077Z p_assert( 2023-01-11T22:54:21.5524420Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5524547Z traceback.print_stack() 2023-01-11T22:54:21.5524786Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5525069Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5525207Z File "", line 1, in 2023-01-11T22:54:21.5525318Z File "", line 1, in 2023-01-11T22:54:21.5525529Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5525674Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5525876Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5526034Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5526243Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5526387Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5526599Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5526688Z self.run() 2023-01-11T22:54:21.5526891Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5527041Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5527243Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5527388Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5527602Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5527704Z self.run() 2023-01-11T22:54:21.5528041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5528180Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5528376Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5528519Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5528883Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5529006Z getattr(self, test_name)() 2023-01-11T22:54:21.5529343Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5529477Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5529823Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5529922Z fn() 2023-01-11T22:54:21.5530278Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5530406Z getattr(self, test_name)() 2023-01-11T22:54:21.5530775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5530896Z test(self, **param_kwargs) 2023-01-11T22:54:21.5531255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5531350Z fn() 2023-01-11T22:54:21.5531700Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5531826Z return func(*args, **kwargs) 2023-01-11T22:54:21.5532189Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5532315Z test(self, **param_kwargs) 2023-01-11T22:54:21.5532568Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5532748Z self.run_subtests( 2023-01-11T22:54:21.5533283Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5533409Z return func(*args, **kwargs) 2023-01-11T22:54:21.5533749Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5533980Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5534250Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5534365Z self.run_subtests( 2023-01-11T22:54:21.5534738Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5534892Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5535241Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5535405Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5535768Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5535888Z output = model(*input) 2023-01-11T22:54:21.5536256Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5536410Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5536735Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5536872Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5537246Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5537367Z output = model(*input) 2023-01-11T22:54:21.5537728Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5537902Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5538225Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5538364Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5538730Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5538855Z _lazy_init(state, module) 2023-01-11T22:54:21.5539229Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5539403Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5539739Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5539912Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5540277Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5540399Z _lazy_init(state, module) 2023-01-11T22:54:21.5540795Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5540943Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5541292Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5541457Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5541776Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5541904Z return func(*args, **kwargs) 2023-01-11T22:54:21.5542301Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5542521Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5542907Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5543012Z p_assert( 2023-01-11T22:54:21.5543398Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5543528Z return func(*args, **kwargs) 2023-01-11T22:54:21.5543853Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5543982Z traceback.print_stack() 2023-01-11T22:54:21.5544356Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5544459Z p_assert( 2023-01-11T22:54:21.5544794Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5544924Z traceback.print_stack() 2023-01-11T22:54:21.5545166Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5545402Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5545516Z File "", line 1, in 2023-01-11T22:54:21.5545731Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5545879Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5546081Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5546234Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5546447Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5546550Z self.run() 2023-01-11T22:54:21.5546737Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5546887Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5547229Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5547363Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5547727Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5547855Z getattr(self, test_name)() 2023-01-11T22:54:21.5548212Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5548309Z fn() 2023-01-11T22:54:21.5548656Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5548781Z test(self, **param_kwargs) 2023-01-11T22:54:21.5549137Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5549264Z return func(*args, **kwargs) 2023-01-11T22:54:21.5549522Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5549637Z self.run_subtests( 2023-01-11T22:54:21.5549993Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5550159Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5550507Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5550662Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5551037Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5551154Z output = model(*input) 2023-01-11T22:54:21.5551482Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5551698Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5552077Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5552252Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5552647Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5552775Z _lazy_init(state, module) 2023-01-11T22:54:21.5553137Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5553303Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5553701Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5553845Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5554183Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5554309Z return func(*args, **kwargs) 2023-01-11T22:54:21.5554685Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5554773Z p_assert( 2023-01-11T22:54:21.5555112Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5555239Z traceback.print_stack() 2023-01-11T22:54:21.5555369Z File "", line 1, in 2023-01-11T22:54:21.5555578Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5555718Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5555919Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5556056Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5556274Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5556376Z self.run() 2023-01-11T22:54:21.5556579Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5556727Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5557072Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5557203Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5557565Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5557671Z getattr(self, test_name)() 2023-01-11T22:54:21.5558032Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5558131Z fn() 2023-01-11T22:54:21.5558499Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5558623Z test(self, **param_kwargs) 2023-01-11T22:54:21.5558981Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5559106Z return func(*args, **kwargs) 2023-01-11T22:54:21.5559365Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5559463Z self.run_subtests( 2023-01-11T22:54:21.5559819Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5559986Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5560353Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5560505Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5560956Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5561076Z output = model(*input) 2023-01-11T22:54:21.5561402Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5561523Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5561945Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5562128Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5562504Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5562627Z _lazy_init(state, module) 2023-01-11T22:54:21.5562978Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5563145Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5563536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5563661Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5563995Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5564123Z return func(*args, **kwargs) 2023-01-11T22:54:21.5564497Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5564600Z p_assert( 2023-01-11T22:54:21.5564937Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5565062Z traceback.print_stack() 2023-01-11T22:54:21.5565300Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5565525Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5565654Z File "", line 1, in 2023-01-11T22:54:21.5565867Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5566008Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5566215Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5566369Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5566582Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5566670Z self.run() 2023-01-11T22:54:21.5566874Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5567018Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5567366Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5567505Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5567871Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5567995Z getattr(self, test_name)() 2023-01-11T22:54:21.5568357Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5568439Z fn() 2023-01-11T22:54:21.5568811Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5568933Z test(self, **param_kwargs) 2023-01-11T22:54:21.5569294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5569420Z return func(*args, **kwargs) 2023-01-11T22:54:21.5569680Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5569888Z self.run_subtests( 2023-01-11T22:54:21.5570253Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5570399Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5570768Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5570968Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5571360Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5571480Z output = model(*input) 2023-01-11T22:54:21.5571807Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5571945Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5572325Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5572488Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5572985Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5573117Z _lazy_init(state, module) 2023-01-11T22:54:21.5573480Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5573652Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5574052Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5574195Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5574326Z File "", line 1, in 2023-01-11T22:54:21.5574646Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5574774Z return func(*args, **kwargs) 2023-01-11T22:54:21.5575153Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5575259Z p_assert( 2023-01-11T22:54:21.5575465Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5575608Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5575946Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5576071Z traceback.print_stack() 2023-01-11T22:54:21.5576256Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5576407Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5576621Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5576726Z self.run() 2023-01-11T22:54:21.5576933Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5577103Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5577443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5577578Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5577928Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5578052Z getattr(self, test_name)() 2023-01-11T22:54:21.5578416Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5578514Z fn() 2023-01-11T22:54:21.5578880Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5579001Z test(self, **param_kwargs) 2023-01-11T22:54:21.5579358Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5579568Z return func(*args, **kwargs) 2023-01-11T22:54:21.5579810Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5579925Z self.run_subtests( 2023-01-11T22:54:21.5580286Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5580508Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5580886Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5581038Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5581414Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5581536Z output = model(*input) 2023-01-11T22:54:21.5581850Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5581990Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5582368Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5582542Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5582912Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5583031Z _lazy_init(state, module) 2023-01-11T22:54:21.5583383Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5583551Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5583930Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5584082Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5584421Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5584548Z return func(*args, **kwargs) 2023-01-11T22:54:21.5584926Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5585029Z p_assert( 2023-01-11T22:54:21.5585369Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5585497Z traceback.print_stack() 2023-01-11T22:54:21.5585717Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5585953Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5586085Z File "", line 1, in 2023-01-11T22:54:21.5586297Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5586445Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5586648Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5586804Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5587002Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5587106Z self.run() 2023-01-11T22:54:21.5587309Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5587455Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5587797Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5587928Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5588291Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5588482Z getattr(self, test_name)() 2023-01-11T22:54:21.5588832Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5588932Z fn() 2023-01-11T22:54:21.5589293Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5589417Z test(self, **param_kwargs) 2023-01-11T22:54:21.5589822Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5589954Z return func(*args, **kwargs) 2023-01-11T22:54:21.5590213Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5590324Z self.run_subtests( 2023-01-11T22:54:21.5590670Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5590838Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5591206Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5591361Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5591737Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5591856Z output = model(*input) 2023-01-11T22:54:21.5592186Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5592321Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5592683Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5592863Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5593229Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5593353Z _lazy_init(state, module) 2023-01-11T22:54:21.5593703Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5593870Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5594269Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5594412Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5594733Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5594862Z return func(*args, **kwargs) 2023-01-11T22:54:21.5595238Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5595340Z p_assert( 2023-01-11T22:54:21.5595682Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5595806Z traceback.print_stack() 2023-01-11T22:54:21.5595932Z File "", line 1, in 2023-01-11T22:54:21.5596144Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5596272Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5596479Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5596632Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5596847Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5596950Z self.run() 2023-01-11T22:54:21.5597156Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5597303Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5597646Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5597823Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5598188Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5598312Z getattr(self, test_name)() 2023-01-11T22:54:21.5598673Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5598816Z fn() 2023-01-11T22:54:21.5599191Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5599314Z test(self, **param_kwargs) 2023-01-11T22:54:21.5599656Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5599783Z return func(*args, **kwargs) 2023-01-11T22:54:21.5600040Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5600158Z self.run_subtests( 2023-01-11T22:54:21.5600515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5600680Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5601043Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5601200Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5601574Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5601679Z output = model(*input) 2023-01-11T22:54:21.5602003Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5602140Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5602517Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5602696Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5603061Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5603181Z _lazy_init(state, module) 2023-01-11T22:54:21.5603536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5603689Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5604085Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5604230Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5604565Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5604694Z return func(*args, **kwargs) 2023-01-11T22:54:21.5605073Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5605173Z p_assert( 2023-01-11T22:54:21.5605502Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5605614Z traceback.print_stack() 2023-01-11T22:54:21.5605854Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5606091Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5606221Z File "", line 1, in 2023-01-11T22:54:21.5606431Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5606574Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5606771Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5606969Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5607187Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5607289Z self.run() 2023-01-11T22:54:21.5607494Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5607641Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5608040Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5608176Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5608546Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5608653Z getattr(self, test_name)() 2023-01-11T22:54:21.5609015Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5609110Z fn() 2023-01-11T22:54:21.5609480Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5609601Z test(self, **param_kwargs) 2023-01-11T22:54:21.5609958Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5610084Z return func(*args, **kwargs) 2023-01-11T22:54:21.5610345Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5610443Z self.run_subtests( 2023-01-11T22:54:21.5610800Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5610962Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5611324Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5611478Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5611850Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5611969Z output = model(*input) 2023-01-11T22:54:21.5612294Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5612416Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5612800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5613117Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5613500Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5613624Z _lazy_init(state, module) 2023-01-11T22:54:21.5613980Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5614152Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5614553Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5614681Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5615022Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5615151Z return func(*args, **kwargs) 2023-01-11T22:54:21.5615533Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5615629Z p_assert( 2023-01-11T22:54:21.5615968Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5616094Z traceback.print_stack() 2023-01-11T22:54:21.5616223Z File "", line 1, in 2023-01-11T22:54:21.5616516Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5616665Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5616862Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5617014Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5617227Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5617383Z self.run() 2023-01-11T22:54:21.5617595Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5617726Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5618073Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5618208Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5618573Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5618699Z getattr(self, test_name)() 2023-01-11T22:54:21.5619061Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5619158Z fn() 2023-01-11T22:54:21.5619520Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5619627Z test(self, **param_kwargs) 2023-01-11T22:54:21.5619989Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5620116Z return func(*args, **kwargs) 2023-01-11T22:54:21.5620375Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5620488Z self.run_subtests( 2023-01-11T22:54:21.5620843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5621008Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5621371Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5621507Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5621883Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5622005Z output = model(*input) 2023-01-11T22:54:21.5622336Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5622475Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5622852Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5623026Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5623392Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5623516Z _lazy_init(state, module) 2023-01-11T22:54:21.5623852Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5624019Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5624420Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5624567Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5624901Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5625030Z return func(*args, **kwargs) 2023-01-11T22:54:21.5625405Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5625573Z p_assert( 2023-01-11T22:54:21.5625900Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5626024Z traceback.print_stack() 2023-01-11T22:54:21.5626263Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5626499Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5626630Z File "", line 1, in 2023-01-11T22:54:21.5626891Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5627045Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5627232Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5627384Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5627596Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5627704Z self.run() 2023-01-11T22:54:21.5627908Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5628055Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5628404Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5628539Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5628888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5629013Z getattr(self, test_name)() 2023-01-11T22:54:21.5629375Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5629474Z fn() 2023-01-11T22:54:21.5629839Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5629962Z test(self, **param_kwargs) 2023-01-11T22:54:21.5630323Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5630447Z return func(*args, **kwargs) 2023-01-11T22:54:21.5630687Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5630801Z self.run_subtests( 2023-01-11T22:54:21.5631161Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5631324Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5631692Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5631843Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5632220Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5632337Z output = model(*input) 2023-01-11T22:54:21.5632650Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5632788Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5633164Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5633340Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5633709Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5633831Z _lazy_init(state, module) 2023-01-11T22:54:21.5634184Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5634349Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5634729Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5634936Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5635283Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5635407Z return func(*args, **kwargs) 2023-01-11T22:54:21.5635786Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5635933Z p_assert( 2023-01-11T22:54:21.5636281Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5636411Z traceback.print_stack() 2023-01-11T22:54:21.5636525Z File "", line 1, in 2023-01-11T22:54:21.5636735Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5636880Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5637082Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5637233Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5637446Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5637550Z self.run() 2023-01-11T22:54:21.5637737Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5637884Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5638229Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5638363Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5638723Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5638848Z getattr(self, test_name)() 2023-01-11T22:54:21.5639210Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5639309Z fn() 2023-01-11T22:54:21.5639657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5639784Z test(self, **param_kwargs) 2023-01-11T22:54:21.5640140Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5640267Z return func(*args, **kwargs) 2023-01-11T22:54:21.5640528Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5640641Z self.run_subtests( 2023-01-11T22:54:21.5641000Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5641160Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5641507Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5641663Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5642042Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5642163Z output = model(*input) 2023-01-11T22:54:21.5642492Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5642633Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5643013Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5643186Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5643535Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5643657Z _lazy_init(state, module) 2023-01-11T22:54:21.5644009Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5644239Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5644642Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5644782Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5645165Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5645298Z return func(*args, **kwargs) 2023-01-11T22:54:21.5645682Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5645769Z p_assert( 2023-01-11T22:54:21.5646105Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5646239Z traceback.print_stack() 2023-01-11T22:54:21.5646478Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5646719Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5646953Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5647185Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5647415Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5647628Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5647860Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5648089Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5648317Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5648547Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5648765Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5648991Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5649215Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5649425Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5649650Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5649872Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5650133Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5650357Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5650590Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5650816Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5651042Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5651247Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5651479Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5651704Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5651929Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5652152Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5652379Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5652667Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5653090Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5653323Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5653600Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5653833Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5654057Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5654279Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5654503Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5654724Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5654953Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5655174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5655380Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5655606Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5655829Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5656052Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5656276Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5656497Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5656723Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5656943Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5657149Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5657373Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5657488Z dist init r=1, world=2 2023-01-11T22:54:21.5657822Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5658137Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5658443Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5658752Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5659052Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5659355Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5659658Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5659958Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5660309Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5660613Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5660957Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5661263Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5661377Z dist init r=0, world=2 2023-01-11T22:54:21.5661703Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5662027Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5662334Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5662646Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5662947Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5663247Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5663532Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5663837Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5664139Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5664443Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5664746Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5665046Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5665152Z ok (6.216s) 2023-01-11T22:54:21.5665497Z test_nested_always_wrap_model_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88366 2023-01-11T22:54:21.5665720Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88367 2023-01-11T22:54:21.5666119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.5666297Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.5666665Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.5666856Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.5667227Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.5667463Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.5667851Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.5668041Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.5668286Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.5668575Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.5668968Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.5669368Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.5669597Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.5669829Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.5670064Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5670293Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5671320Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.5671435Z warnings.warn( 2023-01-11T22:54:21.5672450Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.5672564Z warnings.warn( 2023-01-11T22:54:21.5672695Z File "", line 1, in 2023-01-11T22:54:21.5672896Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5673042Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5673246Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5673399Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5673617Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5673720Z self.run() 2023-01-11T22:54:21.5673928Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5674075Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5674408Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5674544Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5674908Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5675035Z getattr(self, test_name)() 2023-01-11T22:54:21.5675402Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5675501Z fn() 2023-01-11T22:54:21.5675867Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5675990Z test(self, **param_kwargs) 2023-01-11T22:54:21.5676332Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5676533Z return func(*args, **kwargs) 2023-01-11T22:54:21.5676794Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5676907Z self.run_subtests( 2023-01-11T22:54:21.5677268Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5677476Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5677881Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5678034Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5678395Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5678516Z output = model(*input) 2023-01-11T22:54:21.5678848Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5678987Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5679364Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5679540Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5679912Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5680034Z _lazy_init(state, module) 2023-01-11T22:54:21.5680372Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5680542Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5680943Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5681090Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5681431Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5681557Z return func(*args, **kwargs) 2023-01-11T22:54:21.5681935Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5682038Z p_assert( 2023-01-11T22:54:21.5682361Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5682489Z traceback.print_stack() 2023-01-11T22:54:21.5682616Z File "", line 1, in 2023-01-11T22:54:21.5682823Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5682970Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5683174Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5683329Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5683528Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5683633Z self.run() 2023-01-11T22:54:21.5683835Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5683984Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5684333Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5684467Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5684830Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5684954Z getattr(self, test_name)() 2023-01-11T22:54:21.5685298Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5685461Z fn() 2023-01-11T22:54:21.5685835Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5685959Z test(self, **param_kwargs) 2023-01-11T22:54:21.5686316Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5686440Z return func(*args, **kwargs) 2023-01-11T22:54:21.5686741Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5686861Z self.run_subtests( 2023-01-11T22:54:21.5687209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5687373Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5687738Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5687897Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5688273Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5688390Z output = model(*input) 2023-01-11T22:54:21.5688716Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5688857Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5689220Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5689400Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5689769Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5689888Z _lazy_init(state, module) 2023-01-11T22:54:21.5690242Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5690413Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5690810Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5690951Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5691273Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5691399Z return func(*args, **kwargs) 2023-01-11T22:54:21.5691773Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5691875Z p_assert( 2023-01-11T22:54:21.5692213Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5692337Z traceback.print_stack() 2023-01-11T22:54:21.5692576Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5692817Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5693114Z File "", line 1, in 2023-01-11T22:54:21.5693332Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5693477Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5693683Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5693832Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5694048Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5694153Z self.run() 2023-01-11T22:54:21.5694353Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5694482Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5694832Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5695048Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5695422Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5695546Z getattr(self, test_name)() 2023-01-11T22:54:21.5695910Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5696075Z fn() 2023-01-11T22:54:21.5696441Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5696566Z test(self, **param_kwargs) 2023-01-11T22:54:21.5696922Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5697050Z return func(*args, **kwargs) 2023-01-11T22:54:21.5697308Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5697428Z self.run_subtests( 2023-01-11T22:54:21.5697780Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5697942Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5698288Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5698447Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5698824Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5698943Z output = model(*input) 2023-01-11T22:54:21.5699269Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5699409Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5699786Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5699965Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5700333Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5700439Z _lazy_init(state, module) 2023-01-11T22:54:21.5700794Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5700965Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5701361Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5701508Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5701840Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5701970Z return func(*args, **kwargs) 2023-01-11T22:54:21.5702344Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5702431Z p_assert( 2023-01-11T22:54:21.5702769Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5702895Z traceback.print_stack() 2023-01-11T22:54:21.5703029Z File "", line 1, in 2023-01-11T22:54:21.5703245Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5703388Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5703591Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5703726Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5703938Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5704104Z self.run() 2023-01-11T22:54:21.5704304Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5704452Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5704793Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5704924Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5705366Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5705483Z getattr(self, test_name)() 2023-01-11T22:54:21.5705852Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5705951Z fn() 2023-01-11T22:54:21.5706315Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5706438Z test(self, **param_kwargs) 2023-01-11T22:54:21.5706797Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5706922Z return func(*args, **kwargs) 2023-01-11T22:54:21.5707179Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5707276Z self.run_subtests( 2023-01-11T22:54:21.5707633Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5707794Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5708158Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5708312Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5708684Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5708809Z output = model(*input) 2023-01-11T22:54:21.5709136Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5709257Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5709633Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5709808Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5710181Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5710303Z _lazy_init(state, module) 2023-01-11T22:54:21.5710659Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5710827Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5711228Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5711359Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5711696Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5711823Z return func(*args, **kwargs) 2023-01-11T22:54:21.5712202Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5712307Z p_assert( 2023-01-11T22:54:21.5712647Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5712774Z traceback.print_stack() 2023-01-11T22:54:21.5713017Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5713237Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5713367Z File "", line 1, in 2023-01-11T22:54:21.5713639Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5713785Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5713986Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5714137Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5714352Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5714503Z self.run() 2023-01-11T22:54:21.5714695Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5714843Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5715194Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5715328Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5715691Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5715816Z getattr(self, test_name)() 2023-01-11T22:54:21.5716175Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5716258Z fn() 2023-01-11T22:54:21.5716622Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5716744Z test(self, **param_kwargs) 2023-01-11T22:54:21.5717107Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5717230Z return func(*args, **kwargs) 2023-01-11T22:54:21.5717484Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5717598Z self.run_subtests( 2023-01-11T22:54:21.5717957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5718107Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5718475Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5718630Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5719004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5719126Z output = model(*input) 2023-01-11T22:54:21.5719452Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5719589Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5719965Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5720123Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5720493Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5720612Z _lazy_init(state, module) 2023-01-11T22:54:21.5720967Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5721135Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5721534Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5721678Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5722012Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5722137Z return func(*args, **kwargs) 2023-01-11T22:54:21.5722494Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5722661Z p_assert( 2023-01-11T22:54:21.5723008Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5723134Z traceback.print_stack() 2023-01-11T22:54:21.5723263Z File "", line 1, in 2023-01-11T22:54:21.5723473Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5723616Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5723853Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5724012Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5724227Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5724331Z self.run() 2023-01-11T22:54:21.5724537Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5724686Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5725035Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5725170Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5725515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5725641Z getattr(self, test_name)() 2023-01-11T22:54:21.5726000Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5726099Z fn() 2023-01-11T22:54:21.5726463Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5726588Z test(self, **param_kwargs) 2023-01-11T22:54:21.5726945Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5727070Z return func(*args, **kwargs) 2023-01-11T22:54:21.5727311Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5727429Z self.run_subtests( 2023-01-11T22:54:21.5727785Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5727948Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5728315Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5728469Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5728843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5728961Z output = model(*input) 2023-01-11T22:54:21.5729270Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5729409Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5729789Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5729963Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5730332Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5730450Z _lazy_init(state, module) 2023-01-11T22:54:21.5730806Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5730973Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5731351Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5731496Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5731836Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5732020Z return func(*args, **kwargs) 2023-01-11T22:54:21.5732406Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5732508Z p_assert( 2023-01-11T22:54:21.5732843Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5733150Z traceback.print_stack() 2023-01-11T22:54:21.5733448Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5733691Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5733823Z File "", line 1, in 2023-01-11T22:54:21.5734033Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5734178Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5734383Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5734539Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5734735Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5734842Z self.run() 2023-01-11T22:54:21.5735045Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5735193Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5735554Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5735688Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5736053Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5736179Z getattr(self, test_name)() 2023-01-11T22:54:21.5736525Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5736626Z fn() 2023-01-11T22:54:21.5736986Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5737109Z test(self, **param_kwargs) 2023-01-11T22:54:21.5737464Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5737590Z return func(*args, **kwargs) 2023-01-11T22:54:21.5737848Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5737962Z self.run_subtests( 2023-01-11T22:54:21.5738300Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5738464Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5738826Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5738982Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5739359Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5739479Z output = model(*input) 2023-01-11T22:54:21.5739806Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5739948Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5740307Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5740483Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5740849Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5740970Z _lazy_init(state, module) 2023-01-11T22:54:21.5741323Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5741586Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5741991Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5742134Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5742505Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5742638Z return func(*args, **kwargs) 2023-01-11T22:54:21.5743026Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5743129Z p_assert( 2023-01-11T22:54:21.5743467Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5743592Z traceback.print_stack() 2023-01-11T22:54:21.5743724Z File "", line 1, in 2023-01-11T22:54:21.5743936Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5744063Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5744268Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5744420Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5744633Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5744736Z self.run() 2023-01-11T22:54:21.5744937Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5745083Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5745425Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5745543Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5745904Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5746030Z getattr(self, test_name)() 2023-01-11T22:54:21.5746392Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5746491Z fn() 2023-01-11T22:54:21.5746857Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5746982Z test(self, **param_kwargs) 2023-01-11T22:54:21.5747322Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5747449Z return func(*args, **kwargs) 2023-01-11T22:54:21.5747705Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5747819Z self.run_subtests( 2023-01-11T22:54:21.5748173Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5748341Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5748702Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5748852Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5749230Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5749335Z output = model(*input) 2023-01-11T22:54:21.5749662Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5749798Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5750173Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5750349Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5750778Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5750892Z _lazy_init(state, module) 2023-01-11T22:54:21.5751244Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5751396Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5751843Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5751995Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5752344Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5752469Z return func(*args, **kwargs) 2023-01-11T22:54:21.5752845Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5752952Z p_assert( 2023-01-11T22:54:21.5753290Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5753401Z traceback.print_stack() 2023-01-11T22:54:21.5753638Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5753871Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5754004Z File "", line 1, in 2023-01-11T22:54:21.5754213Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5754355Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5754557Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5754692Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5754903Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5755011Z self.run() 2023-01-11T22:54:21.5755210Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5755356Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5755699Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5755833Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5756198Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5756306Z getattr(self, test_name)() 2023-01-11T22:54:21.5756665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5756764Z fn() 2023-01-11T22:54:21.5757128Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5757250Z test(self, **param_kwargs) 2023-01-11T22:54:21.5757608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5757732Z return func(*args, **kwargs) 2023-01-11T22:54:21.5757986Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5758082Z self.run_subtests( 2023-01-11T22:54:21.5758439Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5758604Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5758967Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5759120Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5759495Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5759679Z output = model(*input) 2023-01-11T22:54:21.5760011Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5760133Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5760511Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5760727Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5761108Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5761229Z _lazy_init(state, module) 2023-01-11T22:54:21.5761585Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5761752Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5762148Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5762279Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5762618Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5762744Z return func(*args, **kwargs) 2023-01-11T22:54:21.5763122Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5763225Z p_assert( 2023-01-11T22:54:21.5763562Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5763686Z traceback.print_stack() 2023-01-11T22:54:21.5763814Z File "", line 1, in 2023-01-11T22:54:21.5764007Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5764153Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5764358Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5764508Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5764720Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5764824Z self.run() 2023-01-11T22:54:21.5765028Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5765161Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5765503Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5765636Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5766001Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5766125Z getattr(self, test_name)() 2023-01-11T22:54:21.5766485Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5766586Z fn() 2023-01-11T22:54:21.5766952Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5767058Z test(self, **param_kwargs) 2023-01-11T22:54:21.5767409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5767537Z return func(*args, **kwargs) 2023-01-11T22:54:21.5767797Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5767911Z self.run_subtests( 2023-01-11T22:54:21.5768266Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5768427Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5768787Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5768985Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5769366Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5769485Z output = model(*input) 2023-01-11T22:54:21.5769807Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5769993Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5770383Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5770556Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5770922Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5771040Z _lazy_init(state, module) 2023-01-11T22:54:21.5771378Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5771546Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5771941Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5772085Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5772418Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5772547Z return func(*args, **kwargs) 2023-01-11T22:54:21.5773054Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5773147Z p_assert( 2023-01-11T22:54:21.5773490Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5773617Z traceback.print_stack() 2023-01-11T22:54:21.5773859Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5774090Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5774220Z File "", line 1, in 2023-01-11T22:54:21.5774433Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5774576Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5774766Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5774920Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5775131Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5775232Z self.run() 2023-01-11T22:54:21.5775434Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5775579Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5775926Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5776043Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5776401Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5776526Z getattr(self, test_name)() 2023-01-11T22:54:21.5776888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5776990Z fn() 2023-01-11T22:54:21.5777353Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5777474Z test(self, **param_kwargs) 2023-01-11T22:54:21.5777830Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5777939Z return func(*args, **kwargs) 2023-01-11T22:54:21.5778309Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5778422Z self.run_subtests( 2023-01-11T22:54:21.5778781Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5778942Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5779366Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5779528Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5779910Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5780013Z output = model(*input) 2023-01-11T22:54:21.5780339Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5780478Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5780858Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5781035Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5781404Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5781525Z _lazy_init(state, module) 2023-01-11T22:54:21.5781882Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5782051Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5782431Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5782576Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5782915Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5783041Z return func(*args, **kwargs) 2023-01-11T22:54:21.5783474Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5783576Z p_assert( 2023-01-11T22:54:21.5783915Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5784039Z traceback.print_stack() 2023-01-11T22:54:21.5784154Z File "", line 1, in 2023-01-11T22:54:21.5784365Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5784508Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5784707Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5784857Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5785069Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5785176Z self.run() 2023-01-11T22:54:21.5785361Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5785508Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5785850Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5785984Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5786351Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5786473Z getattr(self, test_name)() 2023-01-11T22:54:21.5786829Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5786929Z fn() 2023-01-11T22:54:21.5787277Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5787463Z test(self, **param_kwargs) 2023-01-11T22:54:21.5787826Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5787950Z return func(*args, **kwargs) 2023-01-11T22:54:21.5788210Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5788325Z self.run_subtests( 2023-01-11T22:54:21.5788729Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5788978Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5789333Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5789488Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5789864Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5789986Z output = model(*input) 2023-01-11T22:54:21.5790307Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5790451Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5790832Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5791009Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5791358Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5791478Z _lazy_init(state, module) 2023-01-11T22:54:21.5791832Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5791999Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5792394Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5792539Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5792874Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5792997Z return func(*args, **kwargs) 2023-01-11T22:54:21.5793361Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5793461Z p_assert( 2023-01-11T22:54:21.5793800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5793925Z traceback.print_stack() 2023-01-11T22:54:21.5794161Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5794429Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5794562Z File "", line 1, in 2023-01-11T22:54:21.5794769Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5794895Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5795097Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5795246Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5795464Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5795568Z self.run() 2023-01-11T22:54:21.5795767Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5795909Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5796235Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5796368Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5796728Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5796917Z getattr(self, test_name)() 2023-01-11T22:54:21.5797282Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5797382Z fn() 2023-01-11T22:54:21.5797746Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5797918Z test(self, **param_kwargs) 2023-01-11T22:54:21.5798274Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5798402Z return func(*args, **kwargs) 2023-01-11T22:54:21.5798659Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5798774Z self.run_subtests( 2023-01-11T22:54:21.5799129Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5799295Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5799654Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5799809Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5800171Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5800293Z output = model(*input) 2023-01-11T22:54:21.5800621Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5800760Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5801137Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5801313Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5801679Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5801801Z _lazy_init(state, module) 2023-01-11T22:54:21.5802137Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5802306Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5802707Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5802852Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5803190Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5803316Z return func(*args, **kwargs) 2023-01-11T22:54:21.5803693Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5803796Z p_assert( 2023-01-11T22:54:21.5804117Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5804246Z traceback.print_stack() 2023-01-11T22:54:21.5804375Z File "", line 1, in 2023-01-11T22:54:21.5804586Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5804731Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5804933Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5805083Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5805294Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5805382Z self.run() 2023-01-11T22:54:21.5805583Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5805731Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5806155Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5806291Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5806654Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5806778Z getattr(self, test_name)() 2023-01-11T22:54:21.5807188Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5807277Z fn() 2023-01-11T22:54:21.5807652Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5807778Z test(self, **param_kwargs) 2023-01-11T22:54:21.5808136Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5808261Z return func(*args, **kwargs) 2023-01-11T22:54:21.5808524Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5808637Z self.run_subtests( 2023-01-11T22:54:21.5808974Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5809134Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5809499Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5809651Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5810026Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5810146Z output = model(*input) 2023-01-11T22:54:21.5810473Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5810616Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5810989Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5811147Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5811515Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5811641Z _lazy_init(state, module) 2023-01-11T22:54:21.5811998Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5812166Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5812562Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5812706Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5813225Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5813342Z return func(*args, **kwargs) 2023-01-11T22:54:21.5813722Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5813825Z p_assert( 2023-01-11T22:54:21.5814160Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5814289Z traceback.print_stack() 2023-01-11T22:54:21.5814524Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5814760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5814890Z File "", line 1, in 2023-01-11T22:54:21.5815085Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5815231Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5815525Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5815676Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5815890Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5815996Z self.run() 2023-01-11T22:54:21.5816199Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5816382Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5816745Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5816877Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5817239Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5817364Z getattr(self, test_name)() 2023-01-11T22:54:21.5817725Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5817827Z fn() 2023-01-11T22:54:21.5818189Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5818295Z test(self, **param_kwargs) 2023-01-11T22:54:21.5818652Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5818781Z return func(*args, **kwargs) 2023-01-11T22:54:21.5819036Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5819150Z self.run_subtests( 2023-01-11T22:54:21.5819501Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5819661Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5820029Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5820171Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5820548Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5820668Z output = model(*input) 2023-01-11T22:54:21.5820997Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5821136Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5821511Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5821686Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5822048Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5822153Z _lazy_init(state, module) 2023-01-11T22:54:21.5822515Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5822682Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5823080Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5823222Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5823565Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5823692Z return func(*args, **kwargs) 2023-01-11T22:54:21.5824068Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5824154Z p_assert( 2023-01-11T22:54:21.5824495Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5824686Z traceback.print_stack() 2023-01-11T22:54:21.5824818Z File "", line 1, in 2023-01-11T22:54:21.5825029Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5825169Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5825371Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5825523Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5825765Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5825872Z self.run() 2023-01-11T22:54:21.5826077Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5826222Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5826570Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5826699Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5827067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5827187Z getattr(self, test_name)() 2023-01-11T22:54:21.5827529Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5827627Z fn() 2023-01-11T22:54:21.5827999Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5828127Z test(self, **param_kwargs) 2023-01-11T22:54:21.5828484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5828604Z return func(*args, **kwargs) 2023-01-11T22:54:21.5828861Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5828957Z self.run_subtests( 2023-01-11T22:54:21.5829311Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5829470Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5829834Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5829985Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5830365Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5830484Z output = model(*input) 2023-01-11T22:54:21.5830810Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5830932Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5831305Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5831482Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5831852Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5831974Z _lazy_init(state, module) 2023-01-11T22:54:21.5832326Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5832495Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5832890Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5833032Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5833353Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5833481Z return func(*args, **kwargs) 2023-01-11T22:54:21.5833862Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5834026Z p_assert( 2023-01-11T22:54:21.5834369Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5834494Z traceback.print_stack() 2023-01-11T22:54:21.5834731Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5835009Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5835130Z File "", line 1, in 2023-01-11T22:54:21.5835341Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5835481Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5835681Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5835832Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5836046Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5836157Z self.run() 2023-01-11T22:54:21.5836342Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5836485Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5836833Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5836965Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5837328Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5837446Z getattr(self, test_name)() 2023-01-11T22:54:21.5837807Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5837906Z fn() 2023-01-11T22:54:21.5838254Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5838380Z test(self, **param_kwargs) 2023-01-11T22:54:21.5838737Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5838858Z return func(*args, **kwargs) 2023-01-11T22:54:21.5839112Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5839226Z self.run_subtests( 2023-01-11T22:54:21.5839575Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5839739Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5840086Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5840239Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5840611Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5840734Z output = model(*input) 2023-01-11T22:54:21.5841061Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5841199Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5841574Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5841753Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5842103Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5842223Z _lazy_init(state, module) 2023-01-11T22:54:21.5842577Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5842742Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5843206Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5843349Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5843684Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5843807Z return func(*args, **kwargs) 2023-01-11T22:54:21.5844213Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5844320Z p_assert( 2023-01-11T22:54:21.5844663Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5844791Z traceback.print_stack() 2023-01-11T22:54:21.5844915Z File "", line 1, in 2023-01-11T22:54:21.5845126Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5845267Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5845470Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5845603Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5845815Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5845915Z self.run() 2023-01-11T22:54:21.5846117Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5846265Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5846602Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5846734Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5847082Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5847204Z getattr(self, test_name)() 2023-01-11T22:54:21.5847560Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5847660Z fn() 2023-01-11T22:54:21.5848029Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5848152Z test(self, **param_kwargs) 2023-01-11T22:54:21.5848509Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5848635Z return func(*args, **kwargs) 2023-01-11T22:54:21.5848875Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5848989Z self.run_subtests( 2023-01-11T22:54:21.5849344Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5849510Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5849879Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5850028Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5850399Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5850516Z output = model(*input) 2023-01-11T22:54:21.5850830Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5850975Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5851348Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5851522Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5851885Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5852103Z _lazy_init(state, module) 2023-01-11T22:54:21.5852461Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5852624Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5853163Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5853367Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5853728Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5853853Z return func(*args, **kwargs) 2023-01-11T22:54:21.5854230Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5854332Z p_assert( 2023-01-11T22:54:21.5854667Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5854798Z traceback.print_stack() 2023-01-11T22:54:21.5855021Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5855256Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5855382Z File "", line 1, in 2023-01-11T22:54:21.5855594Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5855741Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5855942Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5856093Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5856308Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5856396Z self.run() 2023-01-11T22:54:21.5856596Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5856746Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5857088Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5857220Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5857581Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5857704Z getattr(self, test_name)() 2023-01-11T22:54:21.5858065Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5858148Z fn() 2023-01-11T22:54:21.5858510Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5858634Z test(self, **param_kwargs) 2023-01-11T22:54:21.5858985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5859112Z return func(*args, **kwargs) 2023-01-11T22:54:21.5859368Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5859479Z self.run_subtests( 2023-01-11T22:54:21.5859834Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5859982Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5860348Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5860502Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5860875Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5860993Z output = model(*input) 2023-01-11T22:54:21.5861316Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5861532Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5861915Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5862075Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5862441Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5862613Z _lazy_init(state, module) 2023-01-11T22:54:21.5862977Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5863141Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5863533Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5863675Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5864016Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5864127Z return func(*args, **kwargs) 2023-01-11T22:54:21.5864501Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5864604Z p_assert( 2023-01-11T22:54:21.5864946Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5865074Z traceback.print_stack() 2023-01-11T22:54:21.5865198Z File "", line 1, in 2023-01-11T22:54:21.5865405Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5865585Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5865772Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5865921Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5866138Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5866242Z self.run() 2023-01-11T22:54:21.5866444Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5866592Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5866932Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5867052Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5867412Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5867535Z getattr(self, test_name)() 2023-01-11T22:54:21.5867890Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5867989Z fn() 2023-01-11T22:54:21.5868349Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5868474Z test(self, **param_kwargs) 2023-01-11T22:54:21.5868832Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5868941Z return func(*args, **kwargs) 2023-01-11T22:54:21.5869200Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5869318Z self.run_subtests( 2023-01-11T22:54:21.5869669Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5869832Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5870191Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5870338Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5870712Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5870893Z output = model(*input) 2023-01-11T22:54:21.5871228Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5871369Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5871786Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5871966Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5872336Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5872455Z _lazy_init(state, module) 2023-01-11T22:54:21.5872803Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5872956Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5873359Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5873500Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5873835Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5873957Z return func(*args, **kwargs) 2023-01-11T22:54:21.5874333Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5874436Z p_assert( 2023-01-11T22:54:21.5874769Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5874879Z traceback.print_stack() 2023-01-11T22:54:21.5875118Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5875353Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5875488Z File "", line 1, in 2023-01-11T22:54:21.5875696Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5875839Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5876043Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5876196Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5876397Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5876503Z self.run() 2023-01-11T22:54:21.5876706Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5876854Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5877200Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5877334Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5877701Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5877824Z getattr(self, test_name)() 2023-01-11T22:54:21.5878166Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5878263Z fn() 2023-01-11T22:54:21.5878652Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5878776Z test(self, **param_kwargs) 2023-01-11T22:54:21.5879132Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5879257Z return func(*args, **kwargs) 2023-01-11T22:54:21.5879512Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5879622Z self.run_subtests( 2023-01-11T22:54:21.5880030Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5880193Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5880559Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5880713Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5881131Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5881258Z output = model(*input) 2023-01-11T22:54:21.5881590Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5881729Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5882088Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5882267Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5882633Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5882754Z _lazy_init(state, module) 2023-01-11T22:54:21.5883105Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5883277Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5883673Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5883818Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5884138Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5884264Z return func(*args, **kwargs) 2023-01-11T22:54:21.5884638Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5884742Z p_assert( 2023-01-11T22:54:21.5885082Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5885205Z traceback.print_stack() 2023-01-11T22:54:21.5885329Z File "", line 1, in 2023-01-11T22:54:21.5885541Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5885668Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5885866Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5886017Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5886228Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5886326Z self.run() 2023-01-11T22:54:21.5886528Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5886680Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5887006Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5887139Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5887504Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5887625Z getattr(self, test_name)() 2023-01-11T22:54:21.5887985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5888082Z fn() 2023-01-11T22:54:21.5888446Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5888567Z test(self, **param_kwargs) 2023-01-11T22:54:21.5888905Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5889093Z return func(*args, **kwargs) 2023-01-11T22:54:21.5889351Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5889465Z self.run_subtests( 2023-01-11T22:54:21.5889825Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5890035Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5890409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5890561Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5890918Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5891036Z output = model(*input) 2023-01-11T22:54:21.5891359Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5891502Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5891874Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5892043Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5892410Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5892531Z _lazy_init(state, module) 2023-01-11T22:54:21.5893040Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5893254Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5893651Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5893788Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5894129Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5894249Z return func(*args, **kwargs) 2023-01-11T22:54:21.5894621Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5894719Z p_assert( 2023-01-11T22:54:21.5895042Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5895171Z traceback.print_stack() 2023-01-11T22:54:21.5895408Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5895642Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5895772Z File "", line 1, in 2023-01-11T22:54:21.5895984Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5896131Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5896333Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5896469Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5896681Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5896787Z self.run() 2023-01-11T22:54:21.5896994Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5897136Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5897477Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5897613Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5897978Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5898086Z getattr(self, test_name)() 2023-01-11T22:54:21.5898544Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5898641Z fn() 2023-01-11T22:54:21.5899006Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5899130Z test(self, **param_kwargs) 2023-01-11T22:54:21.5899542Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5899677Z return func(*args, **kwargs) 2023-01-11T22:54:21.5899916Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5900029Z self.run_subtests( 2023-01-11T22:54:21.5900394Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5900559Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5900928Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5901081Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5901451Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5901570Z output = model(*input) 2023-01-11T22:54:21.5901897Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5902018Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5902389Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5902562Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5902930Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5903055Z _lazy_init(state, module) 2023-01-11T22:54:21.5903409Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5903576Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5903975Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5904105Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5904443Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5904565Z return func(*args, **kwargs) 2023-01-11T22:54:21.5904941Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5905043Z p_assert( 2023-01-11T22:54:21.5905380Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5905506Z traceback.print_stack() 2023-01-11T22:54:21.5905632Z File "", line 1, in 2023-01-11T22:54:21.5905826Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.5905964Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.5906165Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.5906315Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.5906527Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.5906630Z self.run() 2023-01-11T22:54:21.5906831Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.5906960Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.5907297Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.5907484Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.5907851Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.5907971Z getattr(self, test_name)() 2023-01-11T22:54:21.5908326Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.5908423Z fn() 2023-01-11T22:54:21.5908829Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.5908945Z test(self, **param_kwargs) 2023-01-11T22:54:21.5909310Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.5909434Z return func(*args, **kwargs) 2023-01-11T22:54:21.5909689Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 146, in test_nested_always_wrap_model 2023-01-11T22:54:21.5909805Z self.run_subtests( 2023-01-11T22:54:21.5910159Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.5910320Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.5910680Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.5910817Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.5911195Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.5911314Z output = model(*input) 2023-01-11T22:54:21.5911638Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.5911774Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.5912144Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.5912322Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.5912689Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.5912794Z _lazy_init(state, module) 2023-01-11T22:54:21.5913145Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.5913317Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.5913714Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 223, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.5913856Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.5914191Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.5914316Z return func(*args, **kwargs) 2023-01-11T22:54:21.5914696Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.5914782Z p_assert( 2023-01-11T22:54:21.5915118Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.5915243Z traceback.print_stack() 2023-01-11T22:54:21.5915479Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5915716Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5915951Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5916187Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5916412Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5916624Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5916915Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5917146Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5917374Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5917639Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5917871Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5918101Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5918323Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5918548Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5918761Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5918984Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5919208Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5919426Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5919654Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5919879Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5920102Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5920322Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5920531Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5920757Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5920981Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5921200Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5921421Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5921643Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5921863Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5922083Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5922291Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5922515Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5922745Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5922968Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5923194Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5923417Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5923644Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5923862Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5924086Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5924292Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5924517Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5924798Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5925022Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5925244Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5925505Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5925732Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5925954Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5926159Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5926269Z dist init r=0, world=2 2023-01-11T22:54:21.5926599Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5926913Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5927221Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5927527Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5927829Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5928129Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5928488Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5928788Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5929093Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5929379Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5929678Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5929974Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.5930087Z dist init r=1, world=2 2023-01-11T22:54:21.5930410Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5930727Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5931110Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5931411Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5931711Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5932079Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5932419Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5932709Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5933176Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5933480Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5933785Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5934081Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.5934180Z ok (6.216s) 2023-01-11T22:54:21.5934519Z test_nested_wrapped_model_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88449 2023-01-11T22:54:21.5934739Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88450 2023-01-11T22:54:21.5935129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.5935300Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.5935671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.5935864Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.5936229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.5936403Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.5936781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.5936972Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.5937215Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.5937459Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.5937863Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.5938244Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.5938472Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.5938704Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.5938935Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5939165Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5940189Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.5940390Z warnings.warn( 2023-01-11T22:54:21.5941474Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.5941592Z warnings.warn( 2023-01-11T22:54:21.5941821Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5942053Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5942275Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5942509Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5942732Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5942956Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5943185Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5943412Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5943638Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5943864Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5944074Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5944306Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5944529Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5944754Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5944980Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5945201Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5945975Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5946727Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5947469Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5948209Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5948941Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5949798Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5950548Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5951315Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5952056Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5952783Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5953509Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5954242Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5954972Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5955704Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5956438Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5957164Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5957881Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5958719Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5959460Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5960184Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5960917Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5961644Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5962368Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5963100Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5963823Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5964554Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5965280Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5966009Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5966731Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5967574Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5968312Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5969042Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5969772Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5970486Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5971212Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5971946Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5972185Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5972422Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5972654Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5973008Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5973246Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5973472Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5973694Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5973922Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5974131Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5974355Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5974581Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5974804Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5975026Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5975333Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5975560Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5975782Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5976041Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5976275Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5976497Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5976721Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5976946Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5977174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5977397Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5977620Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.5978379Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5979138Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5979877Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5980609Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5981336Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5982069Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5982796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5983519Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5984313Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5985115Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5985840Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5986571Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5987300Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5988030Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5988755Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5989485Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5990209Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5990937Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5991664Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5992391Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5993194Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5993967Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5994700Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5995423Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5996145Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5996865Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5997590Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5998326Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5999049Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.5999769Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6000498Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6001220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6001943Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6002768Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6003501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6004221Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6004949Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6005668Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6006389Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6007118Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6007840Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6008558Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6009288Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6010009Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6010730Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6011554Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6012291Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6013162Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6013408Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6013646Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6013879Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6014108Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6014333Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6014546Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6014657Z dist init r=1, world=2 2023-01-11T22:54:21.6014769Z dist init r=0, world=2 2023-01-11T22:54:21.6014867Z ok (5.014s) 2023-01-11T22:54:21.6015201Z test_nested_wrapped_model_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88532 2023-01-11T22:54:21.6015421Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88533 2023-01-11T22:54:21.6015804Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.6015984Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.6016350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.6016542Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.6016910Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.6017086Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.6017463Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.6017651Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.6017893Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.6018137Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.6018535Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.6018916Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.6019146Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.6019460Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.6019695Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6019926Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6021015Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.6021140Z warnings.warn( 2023-01-11T22:54:21.6022165Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.6022282Z warnings.warn( 2023-01-11T22:54:21.6022517Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6022750Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6022964Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6023196Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6023425Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6023656Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6023882Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6024112Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6024330Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6024559Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6024766Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6024991Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6025215Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6025441Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6025664Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6025888Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6026896Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.6027129Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T22:54:21.6027867Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6028673Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6029455Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6030206Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6030938Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6031176Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6031405Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6031622Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6031852Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6032080Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6032306Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6032536Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6032762Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6032989Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6033211Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6033420Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6033646Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6033872Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6034097Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6034326Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6034549Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6034774Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6034999Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6035206Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6035430Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6035654Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6035877Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6036164Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6036383Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6037185Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6037937Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6038675Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6039414Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6040146Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6040875Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6041615Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6042341Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6043070Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6043799Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6044528Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6044761Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6045056Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6045272Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6045505Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6045728Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6045996Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6046108Z dist init r=1, world=2 2023-01-11T22:54:21.6046218Z dist init r=0, world=2 2023-01-11T22:54:21.6046317Z ok (5.315s) 2023-01-11T22:54:21.6046658Z test_nested_wrapped_model_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88615 2023-01-11T22:54:21.6046863Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88616 2023-01-11T22:54:21.6047250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.6047421Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.6047800Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.6047991Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.6048362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.6048538Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.6048915Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.6049090Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.6049337Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.6049580Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.6049974Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.6050369Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.6050601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.6050826Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.6051057Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6051289Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6052309Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.6052428Z warnings.warn( 2023-01-11T22:54:21.6053648Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.6053768Z warnings.warn( 2023-01-11T22:54:21.6054099Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6054330Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6054559Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6054791Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6055065Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6055295Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6055517Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6055731Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6055952Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6056177Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6056401Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6056629Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6056853Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6057079Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6057300Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6057510Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6058520Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.6058757Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2023-01-11T22:54:21.6059501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6060243Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6060988Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6061729Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6062458Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6062755Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6062989Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6063216Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6063486Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6063716Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6063943Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6064168Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6064378Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6064605Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6064830Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6065054Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6065275Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6065501Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6065723Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6065946Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6066154Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6066376Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6066601Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6066826Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6067045Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6067270Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6067494Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6067718Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6067939Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6068680Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6069430Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6070171Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6070905Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6071708Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6072484Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6073229Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6073970Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6074711Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6075439Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6076172Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6076407Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6076640Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6076872Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6077104Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6077331Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6077563Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6077676Z dist init r=0, world=2 2023-01-11T22:54:21.6077768Z dist init r=1, world=2 2023-01-11T22:54:21.6077870Z ok (5.315s) 2023-01-11T22:54:21.6078203Z test_nested_wrapped_model_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88698 2023-01-11T22:54:21.6078424Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88699 2023-01-11T22:54:21.6078805Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.6078979Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.6079364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.6079638Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.6079997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.6080174Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.6080566Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.6080803Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.6081053Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.6081304Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.6081715Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.6082113Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.6082346Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.6082555Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.6082790Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6083021Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6084049Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.6095614Z warnings.warn( 2023-01-11T22:54:21.6096810Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.6096932Z warnings.warn( 2023-01-11T22:54:21.6097065Z File "", line 1, in 2023-01-11T22:54:21.6097300Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6097435Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6097654Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6097816Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6098053Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6098161Z self.run() 2023-01-11T22:54:21.6098373Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6098525Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6098888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6099026Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6099414Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6099543Z getattr(self, test_name)() 2023-01-11T22:54:21.6099933Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6100038Z fn() 2023-01-11T22:54:21.6100439Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6100725Z test(self, **param_kwargs) 2023-01-11T22:54:21.6101103Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6101233Z return func(*args, **kwargs) 2023-01-11T22:54:21.6101578Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6101706Z self.run_subtests( 2023-01-11T22:54:21.6102092Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6102290Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6102701Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6102861Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6103273Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6103382Z output = model(*input) 2023-01-11T22:54:21.6103735Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6103883Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6104292Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6104476Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6104869Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6104989Z _lazy_init(state, module) 2023-01-11T22:54:21.6105367Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6105531Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6105957Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6106106Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6106466Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6106601Z return func(*args, **kwargs) 2023-01-11T22:54:21.6107008Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6107115Z p_assert( 2023-01-11T22:54:21.6107476Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6107593Z traceback.print_stack() 2023-01-11T22:54:21.6107729Z File "", line 1, in 2023-01-11T22:54:21.6107954Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6108108Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6108325Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6108484Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6108712Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6108819Z self.run() 2023-01-11T22:54:21.6109022Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6109177Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6109545Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6109686Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6110072Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6110267Z getattr(self, test_name)() 2023-01-11T22:54:21.6110660Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6110761Z fn() 2023-01-11T22:54:21.6111136Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6111266Z test(self, **param_kwargs) 2023-01-11T22:54:21.6111700Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6111837Z return func(*args, **kwargs) 2023-01-11T22:54:21.6112106Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6112227Z self.run_subtests( 2023-01-11T22:54:21.6112614Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6112780Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6113159Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6113322Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6113724Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6113850Z output = model(*input) 2023-01-11T22:54:21.6114199Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6114341Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6114743Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6114929Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6115306Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6115436Z _lazy_init(state, module) 2023-01-11T22:54:21.6115818Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6115996Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6116428Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6116579Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6116943Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6117072Z return func(*args, **kwargs) 2023-01-11T22:54:21.6117460Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6117565Z p_assert( 2023-01-11T22:54:21.6117927Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6118064Z traceback.print_stack() 2023-01-11T22:54:21.6118312Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6118560Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6118695Z File "", line 1, in 2023-01-11T22:54:21.6118923Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6119059Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6119274Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6119435Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6119662Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6119764Z self.run() 2023-01-11T22:54:21.6119978Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6120198Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6120559Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6120699Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6121088Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6121264Z getattr(self, test_name)() 2023-01-11T22:54:21.6121660Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6121758Z fn() 2023-01-11T22:54:21.6122150Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6122279Z test(self, **param_kwargs) 2023-01-11T22:54:21.6122645Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6122782Z return func(*args, **kwargs) 2023-01-11T22:54:21.6123055Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6123174Z self.run_subtests( 2023-01-11T22:54:21.6123555Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6123730Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6124122Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6124282Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6124668Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6124795Z output = model(*input) 2023-01-11T22:54:21.6125150Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6125302Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6125710Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6125892Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6126289Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6126416Z _lazy_init(state, module) 2023-01-11T22:54:21.6126778Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6126956Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6127386Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6127540Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6127902Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6128034Z return func(*args, **kwargs) 2023-01-11T22:54:21.6128442Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6128548Z p_assert( 2023-01-11T22:54:21.6128898Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6129032Z traceback.print_stack() 2023-01-11T22:54:21.6129165Z File "", line 1, in 2023-01-11T22:54:21.6129390Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6129542Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6129757Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6129996Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6130226Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6130315Z self.run() 2023-01-11T22:54:21.6130534Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6130688Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6131116Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6131260Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6131655Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6131785Z getattr(self, test_name)() 2023-01-11T22:54:21.6132171Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6132254Z fn() 2023-01-11T22:54:21.6132654Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6132785Z test(self, **param_kwargs) 2023-01-11T22:54:21.6133403Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6133534Z return func(*args, **kwargs) 2023-01-11T22:54:21.6133813Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6133931Z self.run_subtests( 2023-01-11T22:54:21.6134296Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6134467Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6134862Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6135024Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6135432Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6135558Z output = model(*input) 2023-01-11T22:54:21.6135907Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6136053Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6136459Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6136627Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6137024Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6137152Z _lazy_init(state, module) 2023-01-11T22:54:21.6137531Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6137713Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6138141Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6138292Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6138656Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6138775Z return func(*args, **kwargs) 2023-01-11T22:54:21.6139186Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6139294Z p_assert( 2023-01-11T22:54:21.6139656Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6139786Z traceback.print_stack() 2023-01-11T22:54:21.6140034Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6140412Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6140549Z File "", line 1, in 2023-01-11T22:54:21.6140756Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6140905Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6141121Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6141339Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6141566Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6141675Z self.run() 2023-01-11T22:54:21.6141891Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6142027Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6142405Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6142552Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6142945Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6143075Z getattr(self, test_name)() 2023-01-11T22:54:21.6143464Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6143566Z fn() 2023-01-11T22:54:21.6143963Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6144076Z test(self, **param_kwargs) 2023-01-11T22:54:21.6144464Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6144597Z return func(*args, **kwargs) 2023-01-11T22:54:21.6144869Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6144991Z self.run_subtests( 2023-01-11T22:54:21.6145370Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6145540Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6145937Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6146084Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6146493Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6146617Z output = model(*input) 2023-01-11T22:54:21.6146968Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6147114Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6147516Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6147704Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6148101Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6148210Z _lazy_init(state, module) 2023-01-11T22:54:21.6148595Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6148775Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6149205Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6149357Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6149721Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6149854Z return func(*args, **kwargs) 2023-01-11T22:54:21.6150057Z File "", line 1, in 2023-01-11T22:54:21.6150456Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6150563Z p_assert( 2023-01-11T22:54:21.6150927Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6151064Z traceback.print_stack() 2023-01-11T22:54:21.6151341Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6151497Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6151715Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6151880Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6152091Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6152199Z self.run() 2023-01-11T22:54:21.6152425Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6152577Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6152947Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6153089Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6153477Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6153593Z getattr(self, test_name)() 2023-01-11T22:54:21.6153979Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6154083Z fn() 2023-01-11T22:54:21.6154476Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6154607Z test(self, **param_kwargs) 2023-01-11T22:54:21.6154990Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6155124Z return func(*args, **kwargs) 2023-01-11T22:54:21.6155398Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6155499Z self.run_subtests( 2023-01-11T22:54:21.6155882Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6156061Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6156454Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6156616Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6157023Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6157149Z output = model(*input) 2023-01-11T22:54:21.6157507Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6157635Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6158040Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6158225Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6158623Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6158752Z _lazy_init(state, module) 2023-01-11T22:54:21.6159133Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6159313Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6159747Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6159965Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6160319Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6160449Z return func(*args, **kwargs) 2023-01-11T22:54:21.6160860Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6160968Z p_assert( 2023-01-11T22:54:21.6161385Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6161523Z traceback.print_stack() 2023-01-11T22:54:21.6161774Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6162004Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6162810Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6163613Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6163751Z File "", line 1, in 2023-01-11T22:54:21.6163978Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6164129Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6164347Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6164507Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6164738Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6164847Z self.run() 2023-01-11T22:54:21.6165044Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6165202Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6165575Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6165718Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6166109Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6166242Z getattr(self, test_name)() 2023-01-11T22:54:21.6166629Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6166729Z fn() 2023-01-11T22:54:21.6167103Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6167239Z test(self, **param_kwargs) 2023-01-11T22:54:21.6167622Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6167753Z return func(*args, **kwargs) 2023-01-11T22:54:21.6168024Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6168142Z self.run_subtests( 2023-01-11T22:54:21.6168525Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6168694Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6169071Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6169234Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6169641Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6169866Z output = model(*input) 2023-01-11T22:54:21.6170223Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6170369Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6170773Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6171008Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6171399Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6171525Z _lazy_init(state, module) 2023-01-11T22:54:21.6171906Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6172083Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6172519Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6172669Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6173181Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6173315Z return func(*args, **kwargs) 2023-01-11T22:54:21.6173717Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6173827Z p_assert( 2023-01-11T22:54:21.6174194Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6174328Z traceback.print_stack() 2023-01-11T22:54:21.6174463Z File "", line 1, in 2023-01-11T22:54:21.6174690Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6174843Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6175044Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6175207Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6175436Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6175546Z self.run() 2023-01-11T22:54:21.6175794Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6175950Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6176319Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6176457Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6176829Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6176961Z getattr(self, test_name)() 2023-01-11T22:54:21.6177356Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6177459Z fn() 2023-01-11T22:54:21.6177854Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6177985Z test(self, **param_kwargs) 2023-01-11T22:54:21.6178374Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6178508Z return func(*args, **kwargs) 2023-01-11T22:54:21.6178764Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6178884Z self.run_subtests( 2023-01-11T22:54:21.6179268Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6179438Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6179931Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6180117Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6180525Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6180651Z output = model(*input) 2023-01-11T22:54:21.6181050Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6181209Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6181624Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6181807Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6182204Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6182335Z _lazy_init(state, module) 2023-01-11T22:54:21.6182718Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6182898Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6183312Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6183466Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6183827Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6183960Z return func(*args, **kwargs) 2023-01-11T22:54:21.6184367Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6184475Z p_assert( 2023-01-11T22:54:21.6184835Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6184971Z traceback.print_stack() 2023-01-11T22:54:21.6185203Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6185452Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6185588Z File "", line 1, in 2023-01-11T22:54:21.6185813Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6185968Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6186185Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6186343Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6186568Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6186658Z self.run() 2023-01-11T22:54:21.6186874Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6187034Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6187405Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6187546Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6187931Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6188062Z getattr(self, test_name)() 2023-01-11T22:54:21.6188449Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6188533Z fn() 2023-01-11T22:54:21.6188924Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6189052Z test(self, **param_kwargs) 2023-01-11T22:54:21.6189434Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6189626Z return func(*args, **kwargs) 2023-01-11T22:54:21.6189898Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6190017Z self.run_subtests( 2023-01-11T22:54:21.6190382Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6190552Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6190994Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6191161Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6191571Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6191698Z output = model(*input) 2023-01-11T22:54:21.6192048Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6192199Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6192585Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6192769Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6193162Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6193293Z _lazy_init(state, module) 2023-01-11T22:54:21.6193669Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6193846Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6194274Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6194425Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6194791Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6194906Z return func(*args, **kwargs) 2023-01-11T22:54:21.6195313Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6195420Z p_assert( 2023-01-11T22:54:21.6195786Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6195919Z traceback.print_stack() 2023-01-11T22:54:21.6196055Z File "", line 1, in 2023-01-11T22:54:21.6196281Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6196411Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6196628Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6196788Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6197019Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6197124Z self.run() 2023-01-11T22:54:21.6197340Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6197494Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6197864Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6197989Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6198377Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6198508Z getattr(self, test_name)() 2023-01-11T22:54:21.6198894Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6198996Z fn() 2023-01-11T22:54:21.6199389Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6199593Z test(self, **param_kwargs) 2023-01-11T22:54:21.6199977Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6200092Z return func(*args, **kwargs) 2023-01-11T22:54:21.6200363Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6200530Z self.run_subtests( 2023-01-11T22:54:21.6200924Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6201094Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6201488Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6201651Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6202056Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6202168Z output = model(*input) 2023-01-11T22:54:21.6202515Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6202664Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6203072Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6203256Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6203649Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6203776Z _lazy_init(state, module) 2023-01-11T22:54:21.6204158Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6204317Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6204750Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6204898Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6205264Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6205398Z return func(*args, **kwargs) 2023-01-11T22:54:21.6205806Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6205915Z p_assert( 2023-01-11T22:54:21.6206275Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6206389Z traceback.print_stack() 2023-01-11T22:54:21.6206636Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6206882Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6207024Z File "", line 1, in 2023-01-11T22:54:21.6207247Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6207396Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6207611Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6207772Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6207985Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6208095Z self.run() 2023-01-11T22:54:21.6208315Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6208469Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6208840Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6208979Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6209435Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6209547Z getattr(self, test_name)() 2023-01-11T22:54:21.6209933Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6210033Z fn() 2023-01-11T22:54:21.6210477Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6210612Z test(self, **param_kwargs) 2023-01-11T22:54:21.6211004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6211136Z return func(*args, **kwargs) 2023-01-11T22:54:21.6211409Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6211510Z self.run_subtests( 2023-01-11T22:54:21.6211896Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6212068Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6212464Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6212627Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6213231Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6213360Z output = model(*input) 2023-01-11T22:54:21.6213717Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6213845Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6214245Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6214434Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6214832Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6214959Z _lazy_init(state, module) 2023-01-11T22:54:21.6215338Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6215520Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6215948Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6216098Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6216443Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6216578Z return func(*args, **kwargs) 2023-01-11T22:54:21.6216984Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6217094Z p_assert( 2023-01-11T22:54:21.6217459Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6217593Z traceback.print_stack() 2023-01-11T22:54:21.6217732Z File "", line 1, in 2023-01-11T22:54:21.6217940Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6218092Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6218305Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6218465Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6218692Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6218798Z self.run() 2023-01-11T22:54:21.6219018Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6219264Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6219625Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6219763Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6220155Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6220286Z getattr(self, test_name)() 2023-01-11T22:54:21.6220744Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6220853Z fn() 2023-01-11T22:54:21.6221254Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6221387Z test(self, **param_kwargs) 2023-01-11T22:54:21.6221752Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6221890Z return func(*args, **kwargs) 2023-01-11T22:54:21.6222161Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6222281Z self.run_subtests( 2023-01-11T22:54:21.6222662Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6222834Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6223230Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6223392Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6223781Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6223908Z output = model(*input) 2023-01-11T22:54:21.6224261Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6224409Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6224811Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6224998Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6225394Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6225525Z _lazy_init(state, module) 2023-01-11T22:54:21.6225887Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6226067Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6226547Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6226695Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6227069Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6227200Z return func(*args, **kwargs) 2023-01-11T22:54:21.6227607Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6227715Z p_assert( 2023-01-11T22:54:21.6228064Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6228196Z traceback.print_stack() 2023-01-11T22:54:21.6228447Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6228696Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6228833Z File "", line 1, in 2023-01-11T22:54:21.6229057Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6229278Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6229494Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6229635Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6229863Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6229973Z self.run() 2023-01-11T22:54:21.6230235Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6230394Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6230771Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6230913Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6231285Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6231417Z getattr(self, test_name)() 2023-01-11T22:54:21.6231808Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6231914Z fn() 2023-01-11T22:54:21.6232308Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6232439Z test(self, **param_kwargs) 2023-01-11T22:54:21.6232826Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6232960Z return func(*args, **kwargs) 2023-01-11T22:54:21.6233213Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6233332Z self.run_subtests( 2023-01-11T22:54:21.6233711Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6233883Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6234281Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6234443Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6234847Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6234973Z output = model(*input) 2023-01-11T22:54:21.6235306Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6235453Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6235857Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6236041Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6236436Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6236569Z _lazy_init(state, module) 2023-01-11T22:54:21.6236949Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6237127Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6237536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6237692Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6238055Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6238187Z return func(*args, **kwargs) 2023-01-11T22:54:21.6238595Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6238704Z p_assert( 2023-01-11T22:54:21.6239065Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6239264Z traceback.print_stack() 2023-01-11T22:54:21.6239384Z File "", line 1, in 2023-01-11T22:54:21.6239609Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6239758Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6239977Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6240178Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6240408Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6240517Z self.run() 2023-01-11T22:54:21.6240731Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6240868Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6241240Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6241384Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6241774Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6241904Z getattr(self, test_name)() 2023-01-11T22:54:21.6242294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6242398Z fn() 2023-01-11T22:54:21.6242776Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6242908Z test(self, **param_kwargs) 2023-01-11T22:54:21.6243294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6243426Z return func(*args, **kwargs) 2023-01-11T22:54:21.6243699Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6243821Z self.run_subtests( 2023-01-11T22:54:21.6244204Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6244376Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6244755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6244917Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6245326Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6245453Z output = model(*input) 2023-01-11T22:54:21.6245803Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6245950Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6246358Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6246546Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6246940Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6247050Z _lazy_init(state, module) 2023-01-11T22:54:21.6247433Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6247614Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6248042Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6248193Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6248555Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6248687Z return func(*args, **kwargs) 2023-01-11T22:54:21.6249175Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6249266Z p_assert( 2023-01-11T22:54:21.6249629Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6249758Z traceback.print_stack() 2023-01-11T22:54:21.6250002Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6250304Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6251154Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6251952Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6252097Z File "", line 1, in 2023-01-11T22:54:21.6252321Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6252454Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6252675Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6252829Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6253310Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6253423Z self.run() 2023-01-11T22:54:21.6253645Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6253802Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6254180Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6254302Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6254693Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6254824Z getattr(self, test_name)() 2023-01-11T22:54:21.6255213Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6255316Z fn() 2023-01-11T22:54:21.6255708Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6255839Z test(self, **param_kwargs) 2023-01-11T22:54:21.6256228Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6256341Z return func(*args, **kwargs) 2023-01-11T22:54:21.6256616Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6256736Z self.run_subtests( 2023-01-11T22:54:21.6257118Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6257289Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6257690Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6257853Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6258256Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6258364Z output = model(*input) 2023-01-11T22:54:21.6258716Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6258864Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6259365Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6259548Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6259942Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6260070Z _lazy_init(state, module) 2023-01-11T22:54:21.6260522Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6260691Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6261128Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6261278Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6261640Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6261779Z return func(*args, **kwargs) 2023-01-11T22:54:21.6262186Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6262294Z p_assert( 2023-01-11T22:54:21.6262654Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6262773Z traceback.print_stack() 2023-01-11T22:54:21.6262912Z File "", line 1, in 2023-01-11T22:54:21.6263136Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6263286Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6263505Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6263668Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6263895Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6263988Z self.run() 2023-01-11T22:54:21.6264207Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6264359Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6264719Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6264861Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6265253Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6265386Z getattr(self, test_name)() 2023-01-11T22:54:21.6265772Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6265856Z fn() 2023-01-11T22:54:21.6266248Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6266383Z test(self, **param_kwargs) 2023-01-11T22:54:21.6266770Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6266902Z return func(*args, **kwargs) 2023-01-11T22:54:21.6267177Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6267296Z self.run_subtests( 2023-01-11T22:54:21.6267678Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6267832Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6268229Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6268394Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6268802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6269002Z output = model(*input) 2023-01-11T22:54:21.6269360Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6269509Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6269915Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6270129Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6270536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6270664Z _lazy_init(state, module) 2023-01-11T22:54:21.6271048Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6271227Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6271665Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6271816Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6272181Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6272295Z return func(*args, **kwargs) 2023-01-11T22:54:21.6272705Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6272813Z p_assert( 2023-01-11T22:54:21.6273178Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6273309Z traceback.print_stack() 2023-01-11T22:54:21.6273558Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6273806Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6273947Z File "", line 1, in 2023-01-11T22:54:21.6274154Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6274305Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6274521Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6274682Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6274915Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6275022Z self.run() 2023-01-11T22:54:21.6275242Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6275398Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6275755Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6275895Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6276283Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6276417Z getattr(self, test_name)() 2023-01-11T22:54:21.6276808Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6276913Z fn() 2023-01-11T22:54:21.6277304Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6277435Z test(self, **param_kwargs) 2023-01-11T22:54:21.6277805Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6277940Z return func(*args, **kwargs) 2023-01-11T22:54:21.6278211Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6278330Z self.run_subtests( 2023-01-11T22:54:21.6278711Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6278946Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6279346Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6279510Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6279944Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6280077Z output = model(*input) 2023-01-11T22:54:21.6280432Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6280578Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6281011Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6281196Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6281594Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6281722Z _lazy_init(state, module) 2023-01-11T22:54:21.6282085Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6282264Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6282692Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6282845Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6283207Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6283340Z return func(*args, **kwargs) 2023-01-11T22:54:21.6283744Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6283857Z p_assert( 2023-01-11T22:54:21.6284202Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6284336Z traceback.print_stack() 2023-01-11T22:54:21.6284472Z File "", line 1, in 2023-01-11T22:54:21.6284696Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6284851Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6285071Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6285233Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6285445Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6285554Z self.run() 2023-01-11T22:54:21.6285773Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6285927Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6286299Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6286437Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6286824Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6286956Z getattr(self, test_name)() 2023-01-11T22:54:21.6287327Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6287431Z fn() 2023-01-11T22:54:21.6287828Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6287959Z test(self, **param_kwargs) 2023-01-11T22:54:21.6288343Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6288474Z return func(*args, **kwargs) 2023-01-11T22:54:21.6288809Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6288927Z self.run_subtests( 2023-01-11T22:54:21.6289294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6289465Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6289947Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6290115Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6290530Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6290659Z output = model(*input) 2023-01-11T22:54:21.6291013Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6291164Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6291552Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6291736Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6292133Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6292264Z _lazy_init(state, module) 2023-01-11T22:54:21.6292648Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6292828Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6293515Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6293667Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6294014Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6294154Z return func(*args, **kwargs) 2023-01-11T22:54:21.6294561Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6294667Z p_assert( 2023-01-11T22:54:21.6295033Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6295170Z traceback.print_stack() 2023-01-11T22:54:21.6295420Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6295669Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6295786Z File "", line 1, in 2023-01-11T22:54:21.6296011Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6296161Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6296382Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6296543Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6296772Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6296881Z self.run() 2023-01-11T22:54:21.6297096Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6297234Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6297605Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6297747Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6298137Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6298266Z getattr(self, test_name)() 2023-01-11T22:54:21.6298650Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6298841Z fn() 2023-01-11T22:54:21.6299226Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6299355Z test(self, **param_kwargs) 2023-01-11T22:54:21.6299743Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6299940Z return func(*args, **kwargs) 2023-01-11T22:54:21.6300219Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6300338Z self.run_subtests( 2023-01-11T22:54:21.6300721Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6300896Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6301271Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6301437Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6301841Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6301965Z output = model(*input) 2023-01-11T22:54:21.6302319Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6302468Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6302873Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6303055Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6303448Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6303562Z _lazy_init(state, module) 2023-01-11T22:54:21.6303949Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6304128Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6304561Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6304709Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6305077Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6305207Z return func(*args, **kwargs) 2023-01-11T22:54:21.6305611Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6305700Z p_assert( 2023-01-11T22:54:21.6306064Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6306201Z traceback.print_stack() 2023-01-11T22:54:21.6306336Z File "", line 1, in 2023-01-11T22:54:21.6306561Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6306714Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6306931Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6307073Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6307304Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6307413Z self.run() 2023-01-11T22:54:21.6307632Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6307788Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6308158Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6308297Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6308758Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6308869Z getattr(self, test_name)() 2023-01-11T22:54:21.6309261Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6309364Z fn() 2023-01-11T22:54:21.6309803Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6309940Z test(self, **param_kwargs) 2023-01-11T22:54:21.6310329Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6310459Z return func(*args, **kwargs) 2023-01-11T22:54:21.6310731Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6310832Z self.run_subtests( 2023-01-11T22:54:21.6311211Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6311386Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6311782Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6311943Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6312353Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6312480Z output = model(*input) 2023-01-11T22:54:21.6312827Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6312955Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6313360Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6313544Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6313943Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6314072Z _lazy_init(state, module) 2023-01-11T22:54:21.6314450Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6314631Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6315062Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6315195Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6315557Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6315690Z return func(*args, **kwargs) 2023-01-11T22:54:21.6316099Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6316210Z p_assert( 2023-01-11T22:54:21.6316571Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6316703Z traceback.print_stack() 2023-01-11T22:54:21.6316950Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6317181Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6317318Z File "", line 1, in 2023-01-11T22:54:21.6317538Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6317685Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6317902Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6318061Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6318287Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6318455Z self.run() 2023-01-11T22:54:21.6318655Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6318808Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6319184Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6319324Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6319772Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6319908Z getattr(self, test_name)() 2023-01-11T22:54:21.6320304Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6320389Z fn() 2023-01-11T22:54:21.6320781Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6320915Z test(self, **param_kwargs) 2023-01-11T22:54:21.6321301Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6321434Z return func(*args, **kwargs) 2023-01-11T22:54:21.6321708Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6321826Z self.run_subtests( 2023-01-11T22:54:21.6322209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6322363Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6322753Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6322915Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6323323Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6323453Z output = model(*input) 2023-01-11T22:54:21.6323807Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6323954Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6324358Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6324526Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6324920Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6325048Z _lazy_init(state, module) 2023-01-11T22:54:21.6325431Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6325608Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6326040Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6326190Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6326552Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6326687Z return func(*args, **kwargs) 2023-01-11T22:54:21.6327081Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6327188Z p_assert( 2023-01-11T22:54:21.6327552Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6327685Z traceback.print_stack() 2023-01-11T22:54:21.6327819Z File "", line 1, in 2023-01-11T22:54:21.6328047Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6328196Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6328460Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6328619Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6328848Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6328955Z self.run() 2023-01-11T22:54:21.6329172Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6329377Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6329764Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6329905Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6330275Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6330408Z getattr(self, test_name)() 2023-01-11T22:54:21.6330801Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6330908Z fn() 2023-01-11T22:54:21.6331302Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6331431Z test(self, **param_kwargs) 2023-01-11T22:54:21.6331818Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6331954Z return func(*args, **kwargs) 2023-01-11T22:54:21.6332206Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6332329Z self.run_subtests( 2023-01-11T22:54:21.6332708Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6333069Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6333474Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6333641Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6334048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6334172Z output = model(*input) 2023-01-11T22:54:21.6334507Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6334652Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6335060Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6335244Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6335639Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6335768Z _lazy_init(state, module) 2023-01-11T22:54:21.6336151Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6336334Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6336743Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6336895Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6337268Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6337396Z return func(*args, **kwargs) 2023-01-11T22:54:21.6337800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6337906Z p_assert( 2023-01-11T22:54:21.6338269Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6338507Z traceback.print_stack() 2023-01-11T22:54:21.6338740Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6338988Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6339861Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6340672Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6341461Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6342261Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6343057Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6343843Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6344643Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6345437Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6345573Z File "", line 1, in 2023-01-11T22:54:21.6345803Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6345956Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6346172Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6346333Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6346542Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6346654Z self.run() 2023-01-11T22:54:21.6346873Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6347027Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6347402Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6347543Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6347932Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6348121Z getattr(self, test_name)() 2023-01-11T22:54:21.6348499Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6348598Z fn() 2023-01-11T22:54:21.6348987Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6349117Z test(self, **param_kwargs) 2023-01-11T22:54:21.6349553Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6349691Z return func(*args, **kwargs) 2023-01-11T22:54:21.6349965Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6350064Z self.run_subtests( 2023-01-11T22:54:21.6350454Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6350629Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6351024Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6351189Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6351595Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6351722Z output = model(*input) 2023-01-11T22:54:21.6352076Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6352219Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6352605Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6352791Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6353188Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6353318Z _lazy_init(state, module) 2023-01-11T22:54:21.6353697Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6353875Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6354306Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6354458Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6354802Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6354935Z return func(*args, **kwargs) 2023-01-11T22:54:21.6355342Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6355452Z p_assert( 2023-01-11T22:54:21.6355814Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6355945Z traceback.print_stack() 2023-01-11T22:54:21.6356082Z File "", line 1, in 2023-01-11T22:54:21.6356288Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6356440Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6356660Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6356820Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6357048Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6357155Z self.run() 2023-01-11T22:54:21.6357371Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6357527Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6357877Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6358083Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6358483Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6358615Z getattr(self, test_name)() 2023-01-11T22:54:21.6359050Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6359158Z fn() 2023-01-11T22:54:21.6359558Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6359687Z test(self, **param_kwargs) 2023-01-11T22:54:21.6360054Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6360185Z return func(*args, **kwargs) 2023-01-11T22:54:21.6360462Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6360582Z self.run_subtests( 2023-01-11T22:54:21.6360962Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6361134Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6361532Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6361696Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6362087Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6362213Z output = model(*input) 2023-01-11T22:54:21.6362564Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6362709Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6363119Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6363304Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6363698Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6363826Z _lazy_init(state, module) 2023-01-11T22:54:21.6364192Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6364372Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6364803Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6364951Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6365320Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6365455Z return func(*args, **kwargs) 2023-01-11T22:54:21.6365863Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6365971Z p_assert( 2023-01-11T22:54:21.6366314Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6366447Z traceback.print_stack() 2023-01-11T22:54:21.6366702Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6366950Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6367197Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6367445Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6367691Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6367996Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6368227Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6368473Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6368718Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6369053Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6369301Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6369546Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6369793Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6370036Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6370288Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6370512Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6370759Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6371006Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6371252Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6371496Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6371740Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6371982Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6372231Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6372456Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6373726Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:237: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.6373876Z (rank, world_num_valid_indices[rank]) 2023-01-11T22:54:21.6374972Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:237: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.6375121Z (rank, world_num_valid_indices[rank]) 2023-01-11T22:54:21.6375367Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6375616Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6375863Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6376109Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6376351Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6376594Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6376907Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6377150Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6377395Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6377637Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6377938Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6378188Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6378435Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6378676Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6378902Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6379149Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6379399Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6379644Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6379886Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6380135Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6380379Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6380622Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6380867Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6381092Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6381930Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6382729Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6383524Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6384319Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6385116Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6385907Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6386766Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6387596Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6388398Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6389196Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6389985Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6390772Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6391568Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6392360Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6393151Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6393948Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6394747Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6395536Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6396390Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6397226Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6398024Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6398813Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6399603Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6400391Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6401184Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6401971Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6402760Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6403553Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6404345Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6405134Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6405997Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6406832Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6407629Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6408417Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6409211Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6409999Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6410790Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6411576Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6412369Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6413307Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6414113Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6414902Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6415780Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6416631Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6417437Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6418229Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6419023Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6419812Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6420602Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6421405Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6422194Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6422981Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6423104Z dist init r=0, world=2 2023-01-11T22:54:21.6423461Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6423810Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6424135Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6424479Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6424884Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6425224Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6425605Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6425947Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6426290Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6426634Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6426974Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6427315Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6427437Z dist init r=1, world=2 2023-01-11T22:54:21.6427785Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6428114Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6428462Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6428805Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6429149Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6429486Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6429828Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6430171Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6430517Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6430862Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6431204Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6431546Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6431633Z ok (5.414s) 2023-01-11T22:54:21.6431984Z test_nested_wrapped_model_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88781 2023-01-11T22:54:21.6432275Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88782 2023-01-11T22:54:21.6432679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.6432864Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.6433313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.6433515Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.6433916Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.6434101Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.6434485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.6434690Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.6434948Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.6435204Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.6435632Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.6436054Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.6436297Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.6436539Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.6436786Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6437020Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6438113Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.6438234Z warnings.warn( 2023-01-11T22:54:21.6439319Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.6439439Z warnings.warn( 2023-01-11T22:54:21.6439575Z File "", line 1, in 2023-01-11T22:54:21.6439800Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6439946Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6440164Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6440324Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6440536Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6440642Z self.run() 2023-01-11T22:54:21.6440855Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6441009Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6441483Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6441622Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6442013Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6442145Z getattr(self, test_name)() 2023-01-11T22:54:21.6442561Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6442668Z fn() 2023-01-11T22:54:21.6443071Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6443202Z test(self, **param_kwargs) 2023-01-11T22:54:21.6443587Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6443720Z return func(*args, **kwargs) 2023-01-11T22:54:21.6443997Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6444115Z self.run_subtests( 2023-01-11T22:54:21.6444477Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6444652Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6445059Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6445224Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6445627Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6445748Z output = model(*input) 2023-01-11T22:54:21.6446096Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6446236Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6446628Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6446812Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6447209Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6447337Z _lazy_init(state, module) 2023-01-11T22:54:21.6447723Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6447897Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6448325Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6448469Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6448815Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6448947Z return func(*args, **kwargs) 2023-01-11T22:54:21.6449353Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6449452Z p_assert( 2023-01-11T22:54:21.6449814Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6449949Z traceback.print_stack() 2023-01-11T22:54:21.6450087Z File "", line 1, in 2023-01-11T22:54:21.6450310Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6450442Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6450662Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6450824Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6451052Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6451217Z self.run() 2023-01-11T22:54:21.6451438Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6451595Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6451949Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6452087Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6452525Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6452661Z getattr(self, test_name)() 2023-01-11T22:54:21.6453277Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6453386Z fn() 2023-01-11T22:54:21.6453786Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6453921Z test(self, **param_kwargs) 2023-01-11T22:54:21.6454286Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6454419Z return func(*args, **kwargs) 2023-01-11T22:54:21.6454693Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6454812Z self.run_subtests( 2023-01-11T22:54:21.6455191Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6455363Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6455760Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6455923Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6456312Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6456442Z output = model(*input) 2023-01-11T22:54:21.6456793Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6456938Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6457341Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6457528Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6457921Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6458048Z _lazy_init(state, module) 2023-01-11T22:54:21.6458410Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6458589Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6459026Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6459178Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6459542Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6459673Z return func(*args, **kwargs) 2023-01-11T22:54:21.6460082Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6460194Z p_assert( 2023-01-11T22:54:21.6460535Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6460669Z traceback.print_stack() 2023-01-11T22:54:21.6460921Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6461170Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6461396Z File "", line 1, in 2023-01-11T22:54:21.6461624Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6461773Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6461992Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6462134Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6462419Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6462531Z self.run() 2023-01-11T22:54:21.6462750Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6462905Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6463280Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6463421Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6463809Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6463928Z getattr(self, test_name)() 2023-01-11T22:54:21.6464317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6464422Z fn() 2023-01-11T22:54:21.6464817Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6464950Z test(self, **param_kwargs) 2023-01-11T22:54:21.6465335Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6465467Z return func(*args, **kwargs) 2023-01-11T22:54:21.6465719Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6465838Z self.run_subtests( 2023-01-11T22:54:21.6466216Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6466391Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6466788Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6466948Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6467357Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6467482Z output = model(*input) 2023-01-11T22:54:21.6467833Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6467960Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6468365Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6468551Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6468952Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6469094Z _lazy_init(state, module) 2023-01-11T22:54:21.6469472Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6469651Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6470081Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6470215Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6470582Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6470714Z return func(*args, **kwargs) 2023-01-11T22:54:21.6471120Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6471298Z p_assert( 2023-01-11T22:54:21.6471662Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6471791Z traceback.print_stack() 2023-01-11T22:54:21.6471926Z File "", line 1, in 2023-01-11T22:54:21.6472135Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6472335Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6472559Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6472721Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6472952Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6473060Z self.run() 2023-01-11T22:54:21.6473279Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6473414Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6473793Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6473933Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6474325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6474456Z getattr(self, test_name)() 2023-01-11T22:54:21.6474847Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6474950Z fn() 2023-01-11T22:54:21.6475344Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6475456Z test(self, **param_kwargs) 2023-01-11T22:54:21.6475846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6475978Z return func(*args, **kwargs) 2023-01-11T22:54:21.6476256Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6476379Z self.run_subtests( 2023-01-11T22:54:21.6476763Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6476933Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6477328Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6477474Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6477878Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6478002Z output = model(*input) 2023-01-11T22:54:21.6478355Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6478504Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6478913Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6479100Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6479494Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6479607Z _lazy_init(state, module) 2023-01-11T22:54:21.6479988Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6480166Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6480594Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6480743Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6481108Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6481300Z return func(*args, **kwargs) 2023-01-11T22:54:21.6481739Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6481830Z p_assert( 2023-01-11T22:54:21.6482192Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6482375Z traceback.print_stack() 2023-01-11T22:54:21.6482628Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6482879Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6483017Z File "", line 1, in 2023-01-11T22:54:21.6483243Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6483397Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6483599Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6483760Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6484104Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6484212Z self.run() 2023-01-11T22:54:21.6484433Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6484594Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6484969Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6485091Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6485480Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6485611Z getattr(self, test_name)() 2023-01-11T22:54:21.6486002Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6486106Z fn() 2023-01-11T22:54:21.6486500Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6486630Z test(self, **param_kwargs) 2023-01-11T22:54:21.6487011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6487126Z return func(*args, **kwargs) 2023-01-11T22:54:21.6487401Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6487519Z self.run_subtests( 2023-01-11T22:54:21.6487901Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6488073Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6488470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6488637Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6489046Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6489153Z output = model(*input) 2023-01-11T22:54:21.6489505Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6489657Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6490060Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6490245Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6490645Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6490774Z _lazy_init(state, module) 2023-01-11T22:54:21.6491224Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6491404Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6491813Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6491962Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6492375Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6492510Z return func(*args, **kwargs) 2023-01-11T22:54:21.6493129Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6493241Z p_assert( 2023-01-11T22:54:21.6493610Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6493748Z traceback.print_stack() 2023-01-11T22:54:21.6493866Z File "", line 1, in 2023-01-11T22:54:21.6494092Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6494243Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6494456Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6494616Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6494851Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6494959Z self.run() 2023-01-11T22:54:21.6495156Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6495313Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6495681Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6495818Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6496209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6496338Z getattr(self, test_name)() 2023-01-11T22:54:21.6496724Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6496826Z fn() 2023-01-11T22:54:21.6497205Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6497337Z test(self, **param_kwargs) 2023-01-11T22:54:21.6497723Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6497853Z return func(*args, **kwargs) 2023-01-11T22:54:21.6498124Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6498244Z self.run_subtests( 2023-01-11T22:54:21.6498626Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6498802Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6499177Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6499334Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6499747Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6499872Z output = model(*input) 2023-01-11T22:54:21.6500226Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6500374Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6500779Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6500963Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6501438Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6501566Z _lazy_init(state, module) 2023-01-11T22:54:21.6501946Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6502122Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6502609Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6502764Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6503130Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6503267Z return func(*args, **kwargs) 2023-01-11T22:54:21.6503656Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6503767Z p_assert( 2023-01-11T22:54:21.6504131Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6504264Z traceback.print_stack() 2023-01-11T22:54:21.6504517Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6504770Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6505569Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6506367Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6506507Z File "", line 1, in 2023-01-11T22:54:21.6506734Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6506865Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6507084Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6507245Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6507474Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6507582Z self.run() 2023-01-11T22:54:21.6507799Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6507955Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6508307Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6508451Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6508846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6508976Z getattr(self, test_name)() 2023-01-11T22:54:21.6509363Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6509468Z fn() 2023-01-11T22:54:21.6509861Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6509993Z test(self, **param_kwargs) 2023-01-11T22:54:21.6510362Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6510493Z return func(*args, **kwargs) 2023-01-11T22:54:21.6510766Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6510949Z self.run_subtests( 2023-01-11T22:54:21.6511333Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6511504Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6511901Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6512111Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6512516Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6512641Z output = model(*input) 2023-01-11T22:54:21.6512993Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6513139Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6513548Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6513732Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6514130Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6514261Z _lazy_init(state, module) 2023-01-11T22:54:21.6514629Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6514807Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6515236Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6515386Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6515750Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6515884Z return func(*args, **kwargs) 2023-01-11T22:54:21.6516292Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6516400Z p_assert( 2023-01-11T22:54:21.6516744Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6516878Z traceback.print_stack() 2023-01-11T22:54:21.6517019Z File "", line 1, in 2023-01-11T22:54:21.6517245Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6517399Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6517620Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6517781Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6518007Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6518103Z self.run() 2023-01-11T22:54:21.6518321Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6518478Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6518845Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6518988Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6519380Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6519510Z getattr(self, test_name)() 2023-01-11T22:54:21.6519897Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6519980Z fn() 2023-01-11T22:54:21.6520379Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6520510Z test(self, **param_kwargs) 2023-01-11T22:54:21.6520972Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6521104Z return func(*args, **kwargs) 2023-01-11T22:54:21.6521379Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6521496Z self.run_subtests( 2023-01-11T22:54:21.6521906Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6522087Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6522485Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6522646Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6523054Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6523189Z output = model(*input) 2023-01-11T22:54:21.6523544Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6523687Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6524077Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6524261Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6524660Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6524789Z _lazy_init(state, module) 2023-01-11T22:54:21.6525171Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6525350Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6525783Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6525937Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6526300Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6526414Z return func(*args, **kwargs) 2023-01-11T22:54:21.6526823Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6526939Z p_assert( 2023-01-11T22:54:21.6527297Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6527427Z traceback.print_stack() 2023-01-11T22:54:21.6527675Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6527922Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6528059Z File "", line 1, in 2023-01-11T22:54:21.6528268Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6528418Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6528634Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6528796Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6529027Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6529138Z self.run() 2023-01-11T22:54:21.6529357Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6529492Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6529865Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6530003Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6530393Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6530586Z getattr(self, test_name)() 2023-01-11T22:54:21.6530980Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6531084Z fn() 2023-01-11T22:54:21.6531479Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6531590Z test(self, **param_kwargs) 2023-01-11T22:54:21.6532028Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6532166Z return func(*args, **kwargs) 2023-01-11T22:54:21.6532440Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6532559Z self.run_subtests( 2023-01-11T22:54:21.6533079Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6533259Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6533655Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6533800Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6534203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6534333Z output = model(*input) 2023-01-11T22:54:21.6534689Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6534835Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6535241Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6535427Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6535824Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6535937Z _lazy_init(state, module) 2023-01-11T22:54:21.6536315Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6536494Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6536926Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6537077Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6537446Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6537577Z return func(*args, **kwargs) 2023-01-11T22:54:21.6537980Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6538072Z p_assert( 2023-01-11T22:54:21.6538436Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6538568Z traceback.print_stack() 2023-01-11T22:54:21.6538702Z File "", line 1, in 2023-01-11T22:54:21.6538926Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6539076Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6539298Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6539458Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6539668Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6539778Z self.run() 2023-01-11T22:54:21.6539995Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6540150Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6540521Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6540766Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6541163Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6541276Z getattr(self, test_name)() 2023-01-11T22:54:21.6541727Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6541836Z fn() 2023-01-11T22:54:21.6542237Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6542369Z test(self, **param_kwargs) 2023-01-11T22:54:21.6542752Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6542885Z return func(*args, **kwargs) 2023-01-11T22:54:21.6543157Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6543262Z self.run_subtests( 2023-01-11T22:54:21.6543644Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6543816Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6544214Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6544375Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6544783Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6544911Z output = model(*input) 2023-01-11T22:54:21.6545267Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6545395Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6545806Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6545995Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6546390Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6546522Z _lazy_init(state, module) 2023-01-11T22:54:21.6546910Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6547091Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6547519Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6547669Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6548013Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6548150Z return func(*args, **kwargs) 2023-01-11T22:54:21.6548556Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6548664Z p_assert( 2023-01-11T22:54:21.6549027Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6549163Z traceback.print_stack() 2023-01-11T22:54:21.6549416Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6549644Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6549782Z File "", line 1, in 2023-01-11T22:54:21.6550008Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6550159Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6550375Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6550684Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6550916Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6551025Z self.run() 2023-01-11T22:54:21.6551225Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6551380Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6551846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6551992Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6552390Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6552522Z getattr(self, test_name)() 2023-01-11T22:54:21.6552914Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6553019Z fn() 2023-01-11T22:54:21.6553398Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6553531Z test(self, **param_kwargs) 2023-01-11T22:54:21.6553922Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6554054Z return func(*args, **kwargs) 2023-01-11T22:54:21.6554330Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6554451Z self.run_subtests( 2023-01-11T22:54:21.6554831Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6555002Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6555375Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6555539Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6555946Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6556070Z output = model(*input) 2023-01-11T22:54:21.6556422Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6556570Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6556978Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6557165Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6557541Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6557668Z _lazy_init(state, module) 2023-01-11T22:54:21.6558049Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6558227Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6558656Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6558805Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6559170Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6559304Z return func(*args, **kwargs) 2023-01-11T22:54:21.6559693Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6559800Z p_assert( 2023-01-11T22:54:21.6560164Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6560298Z traceback.print_stack() 2023-01-11T22:54:21.6560435Z File "", line 1, in 2023-01-11T22:54:21.6560721Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6560874Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6561091Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6561234Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6561504Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6561616Z self.run() 2023-01-11T22:54:21.6561836Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6561991Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6562365Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6562507Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6562875Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6563012Z getattr(self, test_name)() 2023-01-11T22:54:21.6563402Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6563503Z fn() 2023-01-11T22:54:21.6563899Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6564032Z test(self, **param_kwargs) 2023-01-11T22:54:21.6564419Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6564553Z return func(*args, **kwargs) 2023-01-11T22:54:21.6564808Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6564927Z self.run_subtests( 2023-01-11T22:54:21.6565309Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6565486Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6565882Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6566045Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6566455Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6566582Z output = model(*input) 2023-01-11T22:54:21.6566914Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6567061Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6567474Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6567663Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6568065Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6568194Z _lazy_init(state, module) 2023-01-11T22:54:21.6568574Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6568757Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6569172Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6569323Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6569691Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6569824Z return func(*args, **kwargs) 2023-01-11T22:54:21.6570233Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6570400Z p_assert( 2023-01-11T22:54:21.6570768Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6570902Z traceback.print_stack() 2023-01-11T22:54:21.6571131Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6571381Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6571571Z File "", line 1, in 2023-01-11T22:54:21.6571801Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6571953Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6572174Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6572334Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6572566Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6572660Z self.run() 2023-01-11T22:54:21.6573059Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6573222Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6573602Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6573744Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6574138Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6574271Z getattr(self, test_name)() 2023-01-11T22:54:21.6574659Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6574743Z fn() 2023-01-11T22:54:21.6575138Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6575269Z test(self, **param_kwargs) 2023-01-11T22:54:21.6575661Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6575794Z return func(*args, **kwargs) 2023-01-11T22:54:21.6576067Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6576187Z self.run_subtests( 2023-01-11T22:54:21.6576551Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6576723Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6577118Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6577281Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6577686Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6577813Z output = model(*input) 2023-01-11T22:54:21.6578167Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6578313Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6578717Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6578883Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6579283Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6579412Z _lazy_init(state, module) 2023-01-11T22:54:21.6579796Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6579975Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6580405Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6580642Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6581012Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6581126Z return func(*args, **kwargs) 2023-01-11T22:54:21.6581534Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6581702Z p_assert( 2023-01-11T22:54:21.6582079Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6582236Z traceback.print_stack() 2023-01-11T22:54:21.6582374Z File "", line 1, in 2023-01-11T22:54:21.6582598Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6582729Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6582946Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6583111Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6583346Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6583453Z self.run() 2023-01-11T22:54:21.6583673Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6583828Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6584203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6584326Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6584717Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6584847Z getattr(self, test_name)() 2023-01-11T22:54:21.6585236Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6585341Z fn() 2023-01-11T22:54:21.6585736Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6585868Z test(self, **param_kwargs) 2023-01-11T22:54:21.6586256Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6586371Z return func(*args, **kwargs) 2023-01-11T22:54:21.6586646Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6586765Z self.run_subtests( 2023-01-11T22:54:21.6587142Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6587314Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6587705Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6587870Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6588277Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6588385Z output = model(*input) 2023-01-11T22:54:21.6588735Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6588886Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6589293Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6589480Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6589873Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6590004Z _lazy_init(state, module) 2023-01-11T22:54:21.6590382Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6590606Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6591038Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6591189Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6591643Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6591780Z return func(*args, **kwargs) 2023-01-11T22:54:21.6592195Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6592305Z p_assert( 2023-01-11T22:54:21.6592667Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6592782Z traceback.print_stack() 2023-01-11T22:54:21.6593036Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6593282Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6594087Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6594882Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6595022Z File "", line 1, in 2023-01-11T22:54:21.6595247Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6595401Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6595620Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6595778Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6595988Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6596098Z self.run() 2023-01-11T22:54:21.6596317Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6596473Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6596838Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6596980Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6597368Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6597483Z getattr(self, test_name)() 2023-01-11T22:54:21.6597871Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6597973Z fn() 2023-01-11T22:54:21.6598366Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6598497Z test(self, **param_kwargs) 2023-01-11T22:54:21.6598891Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6599025Z return func(*args, **kwargs) 2023-01-11T22:54:21.6599296Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6599396Z self.run_subtests( 2023-01-11T22:54:21.6599776Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6599952Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6600423Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6600587Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6600993Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6601120Z output = model(*input) 2023-01-11T22:54:21.6601523Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6601657Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6602067Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6602251Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6602650Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6602784Z _lazy_init(state, module) 2023-01-11T22:54:21.6603166Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6603344Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6603777Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6604043Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6604388Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6604522Z return func(*args, **kwargs) 2023-01-11T22:54:21.6604932Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6605041Z p_assert( 2023-01-11T22:54:21.6605407Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6605544Z traceback.print_stack() 2023-01-11T22:54:21.6605683Z File "", line 1, in 2023-01-11T22:54:21.6605890Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6606036Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6606253Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6606416Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6606646Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6606755Z self.run() 2023-01-11T22:54:21.6606974Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6607135Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6607488Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6607629Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6608018Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6608149Z getattr(self, test_name)() 2023-01-11T22:54:21.6608534Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6608638Z fn() 2023-01-11T22:54:21.6609041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6609172Z test(self, **param_kwargs) 2023-01-11T22:54:21.6609542Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6609676Z return func(*args, **kwargs) 2023-01-11T22:54:21.6609949Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6610140Z self.run_subtests( 2023-01-11T22:54:21.6610525Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6610696Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6611091Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6611299Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6611700Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6611824Z output = model(*input) 2023-01-11T22:54:21.6612176Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6612321Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6612728Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6613103Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6613512Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6613643Z _lazy_init(state, module) 2023-01-11T22:54:21.6614011Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6614192Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6614620Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6614771Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6615137Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6615269Z return func(*args, **kwargs) 2023-01-11T22:54:21.6615682Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6615792Z p_assert( 2023-01-11T22:54:21.6616136Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6616270Z traceback.print_stack() 2023-01-11T22:54:21.6616524Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6616772Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6616910Z File "", line 1, in 2023-01-11T22:54:21.6617135Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6617286Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6617504Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6617651Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6617879Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6617989Z self.run() 2023-01-11T22:54:21.6618208Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6618363Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6618736Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6618880Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6619252Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6619386Z getattr(self, test_name)() 2023-01-11T22:54:21.6619775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6619878Z fn() 2023-01-11T22:54:21.6620367Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6620501Z test(self, **param_kwargs) 2023-01-11T22:54:21.6620888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6621021Z return func(*args, **kwargs) 2023-01-11T22:54:21.6621335Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6621462Z self.run_subtests( 2023-01-11T22:54:21.6621850Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6622023Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6622417Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6622580Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6622991Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6623118Z output = model(*input) 2023-01-11T22:54:21.6623454Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6623601Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6624011Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6624197Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6624595Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6624723Z _lazy_init(state, module) 2023-01-11T22:54:21.6625104Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6625286Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6625698Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6625849Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6626214Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6626351Z return func(*args, **kwargs) 2023-01-11T22:54:21.6626759Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6626868Z p_assert( 2023-01-11T22:54:21.6627235Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6627371Z traceback.print_stack() 2023-01-11T22:54:21.6627489Z File "", line 1, in 2023-01-11T22:54:21.6627716Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6627874Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6628095Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6628255Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6628485Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6628592Z self.run() 2023-01-11T22:54:21.6628812Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6628950Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6629316Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6629457Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6629844Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6630041Z getattr(self, test_name)() 2023-01-11T22:54:21.6630437Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6630539Z fn() 2023-01-11T22:54:21.6630915Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6631047Z test(self, **param_kwargs) 2023-01-11T22:54:21.6631487Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6631625Z return func(*args, **kwargs) 2023-01-11T22:54:21.6631901Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6632022Z self.run_subtests( 2023-01-11T22:54:21.6632408Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6632588Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6632963Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6633127Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6633536Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6633663Z output = model(*input) 2023-01-11T22:54:21.6634022Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6634175Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6634579Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6634765Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6635163Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6635277Z _lazy_init(state, module) 2023-01-11T22:54:21.6635660Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6635840Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6636273Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6636425Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6636789Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6636921Z return func(*args, **kwargs) 2023-01-11T22:54:21.6637328Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6637418Z p_assert( 2023-01-11T22:54:21.6637787Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6637919Z traceback.print_stack() 2023-01-11T22:54:21.6638168Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6638416Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6638552Z File "", line 1, in 2023-01-11T22:54:21.6638781Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6638913Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6639130Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6639291Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6639519Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6639627Z self.run() 2023-01-11T22:54:21.6639910Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6640066Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6640441Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6640564Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6640955Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6641137Z getattr(self, test_name)() 2023-01-11T22:54:21.6641538Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6641642Z fn() 2023-01-11T22:54:21.6642038Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6642170Z test(self, **param_kwargs) 2023-01-11T22:54:21.6642553Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6642671Z return func(*args, **kwargs) 2023-01-11T22:54:21.6642946Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6643064Z self.run_subtests( 2023-01-11T22:54:21.6643446Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6643621Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6644017Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6644175Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6644577Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6644683Z output = model(*input) 2023-01-11T22:54:21.6645040Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6645188Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6645595Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6645779Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6646177Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6646306Z _lazy_init(state, module) 2023-01-11T22:54:21.6646688Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6646850Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6647281Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6647437Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6647802Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6647934Z return func(*args, **kwargs) 2023-01-11T22:54:21.6648343Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6648453Z p_assert( 2023-01-11T22:54:21.6648820Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6648935Z traceback.print_stack() 2023-01-11T22:54:21.6649073Z File "", line 1, in 2023-01-11T22:54:21.6649300Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6649451Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6649668Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6649892Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6650120Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6650229Z self.run() 2023-01-11T22:54:21.6650429Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6650584Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6651014Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6651157Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6651550Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6651682Z getattr(self, test_name)() 2023-01-11T22:54:21.6652070Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6652152Z fn() 2023-01-11T22:54:21.6652552Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6652683Z test(self, **param_kwargs) 2023-01-11T22:54:21.6653292Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6653428Z return func(*args, **kwargs) 2023-01-11T22:54:21.6653708Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6653831Z self.run_subtests( 2023-01-11T22:54:21.6654214Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6654367Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6654761Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6654925Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6655332Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6655458Z output = model(*input) 2023-01-11T22:54:21.6655812Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6655958Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6656368Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6656533Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6656928Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6657056Z _lazy_init(state, module) 2023-01-11T22:54:21.6657439Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6657623Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6658053Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6658204Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6658567Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6658703Z return func(*args, **kwargs) 2023-01-11T22:54:21.6659094Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6659202Z p_assert( 2023-01-11T22:54:21.6659566Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6659699Z traceback.print_stack() 2023-01-11T22:54:21.6659952Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6660292Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6660429Z File "", line 1, in 2023-01-11T22:54:21.6660635Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6660787Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6661063Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6661230Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6661460Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6661569Z self.run() 2023-01-11T22:54:21.6661788Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6661944Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6662303Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6662447Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6662841Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6662973Z getattr(self, test_name)() 2023-01-11T22:54:21.6663362Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6663465Z fn() 2023-01-11T22:54:21.6663860Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6663991Z test(self, **param_kwargs) 2023-01-11T22:54:21.6664358Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6664492Z return func(*args, **kwargs) 2023-01-11T22:54:21.6664765Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6664890Z self.run_subtests( 2023-01-11T22:54:21.6665270Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6665444Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6665838Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6666002Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6666389Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6666512Z output = model(*input) 2023-01-11T22:54:21.6666864Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6667011Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6667409Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6667599Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6667991Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6668118Z _lazy_init(state, module) 2023-01-11T22:54:21.6668501Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6668662Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6669090Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6669238Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6669599Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6669797Z return func(*args, **kwargs) 2023-01-11T22:54:21.6670211Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6670319Z p_assert( 2023-01-11T22:54:21.6670680Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6670795Z traceback.print_stack() 2023-01-11T22:54:21.6670928Z File "", line 1, in 2023-01-11T22:54:21.6671202Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6671362Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6671580Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6671740Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6671966Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6672056Z self.run() 2023-01-11T22:54:21.6672279Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6672435Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6672812Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6672949Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6673341Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6673472Z getattr(self, test_name)() 2023-01-11T22:54:21.6673860Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6673944Z fn() 2023-01-11T22:54:21.6674338Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6674467Z test(self, **param_kwargs) 2023-01-11T22:54:21.6674850Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6674986Z return func(*args, **kwargs) 2023-01-11T22:54:21.6675258Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6675377Z self.run_subtests( 2023-01-11T22:54:21.6675756Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6675913Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6676305Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6676467Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6676880Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6677004Z output = model(*input) 2023-01-11T22:54:21.6677362Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6677507Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6677916Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6678081Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6678477Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6678604Z _lazy_init(state, module) 2023-01-11T22:54:21.6678985Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6679163Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6679597Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6679827Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6680199Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6680313Z return func(*args, **kwargs) 2023-01-11T22:54:21.6680720Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6680828Z p_assert( 2023-01-11T22:54:21.6681243Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6681386Z traceback.print_stack() 2023-01-11T22:54:21.6681638Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6681887Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6682723Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6683531Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6684333Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6685126Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6685924Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6686718Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6687515Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6688314Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6688451Z File "", line 1, in 2023-01-11T22:54:21.6688659Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6688812Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6689034Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6689196Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6689490Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6689602Z self.run() 2023-01-11T22:54:21.6689822Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6689959Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6690335Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6690523Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6690933Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6691064Z getattr(self, test_name)() 2023-01-11T22:54:21.6691451Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6691551Z fn() 2023-01-11T22:54:21.6691948Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6692065Z test(self, **param_kwargs) 2023-01-11T22:54:21.6692453Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6692583Z return func(*args, **kwargs) 2023-01-11T22:54:21.6693031Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6693167Z self.run_subtests( 2023-01-11T22:54:21.6693554Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6693727Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6694120Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6694264Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6694664Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6694791Z output = model(*input) 2023-01-11T22:54:21.6695144Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6695289Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6695697Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6695882Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6696275Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6696384Z _lazy_init(state, module) 2023-01-11T22:54:21.6696763Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6696942Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6697378Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6697528Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6697890Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6698026Z return func(*args, **kwargs) 2023-01-11T22:54:21.6698433Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6698523Z p_assert( 2023-01-11T22:54:21.6698887Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6699019Z traceback.print_stack() 2023-01-11T22:54:21.6699157Z File "", line 1, in 2023-01-11T22:54:21.6699383Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6699626Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6699844Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6700005Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6700217Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6700324Z self.run() 2023-01-11T22:54:21.6700602Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6700764Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6701137Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6701279Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6701665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6701794Z getattr(self, test_name)() 2023-01-11T22:54:21.6702166Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6702269Z fn() 2023-01-11T22:54:21.6702661Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6702795Z test(self, **param_kwargs) 2023-01-11T22:54:21.6703182Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6703315Z return func(*args, **kwargs) 2023-01-11T22:54:21.6703587Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6703687Z self.run_subtests( 2023-01-11T22:54:21.6704067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6704239Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6704639Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6704804Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6705207Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6705333Z output = model(*input) 2023-01-11T22:54:21.6705688Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6705817Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6706224Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6706410Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6706804Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6706937Z _lazy_init(state, module) 2023-01-11T22:54:21.6707318Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6707495Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6707922Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6708076Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6708423Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6708557Z return func(*args, **kwargs) 2023-01-11T22:54:21.6708967Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6709077Z p_assert( 2023-01-11T22:54:21.6709436Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6709633Z traceback.print_stack() 2023-01-11T22:54:21.6709885Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6710132Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6710364Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6710656Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6710911Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6711154Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6711402Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6711649Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6711903Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6712148Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6712377Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6712623Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6712870Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6713115Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6713355Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6713601Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6713844Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6714094Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6714319Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6714563Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6714807Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6715051Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6715293Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6715536Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6716348Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6717154Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6717951Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6718740Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6719605Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6720446Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6721255Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6722054Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6722850Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6723643Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6724439Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6725233Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6726025Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6726825Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6727618Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6728410Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6729278Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6730123Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6730927Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6731725Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6731977Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6732207Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6732460Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6732708Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6733193Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6733448Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6733696Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6733942Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6734190Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6734420Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6734666Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6734908Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6735153Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6735392Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6735639Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6735881Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6736124Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6736370Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6736597Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6736842Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6737086Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6737329Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6737657Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6737900Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6739085Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.6739209Z world_indices[ 2023-01-11T22:54:21.6740015Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6740816Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6741611Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6742402Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6743196Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6743990Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6744111Z dist init r=1, world=2 2023-01-11T22:54:21.6744446Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6744796Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6745140Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6745485Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6745829Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6746169Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6746507Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6746960Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6747304Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6747688Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6748039Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6748383Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.6748486Z dist init r=0, world=2 2023-01-11T22:54:21.6748835Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6749181Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6749527Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6749875Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6750218Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6750565Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6750906Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6751251Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6751594Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6751935Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6752259Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6752604Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.6752710Z ok (5.815s) 2023-01-11T22:54:21.6753077Z test_nested_wrapped_model_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88864 2023-01-11T22:54:21.6753310Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88865 2023-01-11T22:54:21.6753723Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.6753911Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.6754305Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.6754554Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.6754948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.6755149Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.6755604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.6755812Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.6756071Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.6756502Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.6756760Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.6757189Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.6757432Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.6757656Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.6757904Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6758149Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6759239Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.6759360Z warnings.warn( 2023-01-11T22:54:21.6760446Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.6760563Z warnings.warn( 2023-01-11T22:54:21.6760703Z File "", line 1, in 2023-01-11T22:54:21.6760932Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6761082Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6761280Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6761444Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6761673Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6761784Z self.run() 2023-01-11T22:54:21.6762002Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6762156Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6762295Z File "", line 1, in 2023-01-11T22:54:21.6762670Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6762793Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6763185Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6763316Z getattr(self, test_name)() 2023-01-11T22:54:21.6763542Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6763760Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6764157Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6764260Z fn() 2023-01-11T22:54:21.6764480Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6764623Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6765072Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6765208Z test(self, **param_kwargs) 2023-01-11T22:54:21.6765441Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6765550Z self.run() 2023-01-11T22:54:21.6765945Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6766079Z return func(*args, **kwargs) 2023-01-11T22:54:21.6766282Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6766438Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6766710Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6766828Z self.run_subtests( 2023-01-11T22:54:21.6767196Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6767337Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6767721Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6767895Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6768264Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6768398Z getattr(self, test_name)() 2023-01-11T22:54:21.6768796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6768958Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6769343Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6769444Z fn() 2023-01-11T22:54:21.6769850Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6769975Z output = model(*input) 2023-01-11T22:54:21.6770351Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6770480Z test(self, **param_kwargs) 2023-01-11T22:54:21.6770828Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6770974Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6771359Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6771492Z return func(*args, **kwargs) 2023-01-11T22:54:21.6771899Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6772083Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6772342Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6772460Z self.run_subtests( 2023-01-11T22:54:21.6773049Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6773190Z _lazy_init(state, module) 2023-01-11T22:54:21.6773566Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6773826Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6774196Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6774373Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6774720Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6774939Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6775362Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6775510Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6775888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6776010Z output = model(*input) 2023-01-11T22:54:21.6776358Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6776489Z return func(*args, **kwargs) 2023-01-11T22:54:21.6776797Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6776937Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6777322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6777428Z p_assert( 2023-01-11T22:54:21.6777806Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6777981Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6778322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6778451Z traceback.print_stack() 2023-01-11T22:54:21.6778804Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6778927Z _lazy_init(state, module) 2023-01-11T22:54:21.6779282Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6779452Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6779853Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6779998Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6780342Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6780470Z return func(*args, **kwargs) 2023-01-11T22:54:21.6780847Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6780938Z p_assert( 2023-01-11T22:54:21.6781278Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6781409Z traceback.print_stack() 2023-01-11T22:54:21.6781652Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6781889Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6782026Z File "", line 1, in 2023-01-11T22:54:21.6782239Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6782366Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6782570Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6782722Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6782935Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6783106Z self.run() 2023-01-11T22:54:21.6783338Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6783490Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6783841Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6783959Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6784378Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6784513Z getattr(self, test_name)() 2023-01-11T22:54:21.6784883Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6784984Z fn() 2023-01-11T22:54:21.6785350Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6785477Z test(self, **param_kwargs) 2023-01-11T22:54:21.6785837Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6785945Z return func(*args, **kwargs) 2023-01-11T22:54:21.6786199Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6786314Z self.run_subtests( 2023-01-11T22:54:21.6786670Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6786835Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6787204Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6787360Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6787736Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6787843Z output = model(*input) 2023-01-11T22:54:21.6788170Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6788312Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6788690Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6788865Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6789232Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6789359Z _lazy_init(state, module) 2023-01-11T22:54:21.6789713Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6789864Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6790264Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6790412Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6790750Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6790878Z return func(*args, **kwargs) 2023-01-11T22:54:21.6791260Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6791368Z p_assert( 2023-01-11T22:54:21.6791712Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6791821Z traceback.print_stack() 2023-01-11T22:54:21.6791956Z File "", line 1, in 2023-01-11T22:54:21.6792172Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6792317Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6792524Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6792745Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6792963Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6793051Z self.run() 2023-01-11T22:54:21.6793260Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6793406Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6793804Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6793945Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6794318Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6794447Z getattr(self, test_name)() 2023-01-11T22:54:21.6794811Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6794896Z fn() 2023-01-11T22:54:21.6795265Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6795391Z test(self, **param_kwargs) 2023-01-11T22:54:21.6795749Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6795878Z return func(*args, **kwargs) 2023-01-11T22:54:21.6796136Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6796251Z self.run_subtests( 2023-01-11T22:54:21.6796608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6796753Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6797119Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6797280Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6797659Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6797782Z output = model(*input) 2023-01-11T22:54:21.6798109Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6798251Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6798636Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6798795Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6799166Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6799290Z _lazy_init(state, module) 2023-01-11T22:54:21.6799647Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6799822Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6800223Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6800369Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6800716Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6800826Z return func(*args, **kwargs) 2023-01-11T22:54:21.6801204Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6801308Z p_assert( 2023-01-11T22:54:21.6801648Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6801776Z traceback.print_stack() 2023-01-11T22:54:21.6802078Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6802317Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6802448Z File "", line 1, in 2023-01-11T22:54:21.6802642Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6802787Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6803032Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6803191Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6803407Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6803514Z self.run() 2023-01-11T22:54:21.6803720Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6803870Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6804203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6804343Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6804709Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6804835Z getattr(self, test_name)() 2023-01-11T22:54:21.6805201Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6805306Z fn() 2023-01-11T22:54:21.6805677Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6805802Z test(self, **param_kwargs) 2023-01-11T22:54:21.6806143Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6806271Z return func(*args, **kwargs) 2023-01-11T22:54:21.6806524Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6806645Z self.run_subtests( 2023-01-11T22:54:21.6806999Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6807162Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6807531Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6807685Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6808044Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6808165Z output = model(*input) 2023-01-11T22:54:21.6808495Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6808636Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6809018Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6809196Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6809564Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6809686Z _lazy_init(state, module) 2023-01-11T22:54:21.6810026Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6810197Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6810597Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6810742Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6811081Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6811282Z return func(*args, **kwargs) 2023-01-11T22:54:21.6811672Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6811779Z p_assert( 2023-01-11T22:54:21.6812097Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6812229Z traceback.print_stack() 2023-01-11T22:54:21.6812409Z File "", line 1, in 2023-01-11T22:54:21.6812630Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6812776Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6813287Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6813447Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6813643Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6813756Z self.run() 2023-01-11T22:54:21.6813963Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6814109Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6814460Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6814597Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6814967Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6815092Z getattr(self, test_name)() 2023-01-11T22:54:21.6815437Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6815537Z fn() 2023-01-11T22:54:21.6815905Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6816029Z test(self, **param_kwargs) 2023-01-11T22:54:21.6816389Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6816515Z return func(*args, **kwargs) 2023-01-11T22:54:21.6816770Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6816885Z self.run_subtests( 2023-01-11T22:54:21.6817222Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6817389Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6817754Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6817908Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6818285Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6818410Z output = model(*input) 2023-01-11T22:54:21.6818736Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6818876Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6819237Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6819415Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6819787Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6819911Z _lazy_init(state, module) 2023-01-11T22:54:21.6820265Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6820435Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6820836Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6821070Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6821399Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6821526Z return func(*args, **kwargs) 2023-01-11T22:54:21.6821965Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6822078Z p_assert( 2023-01-11T22:54:21.6822424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6822554Z traceback.print_stack() 2023-01-11T22:54:21.6822801Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6823037Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6823792Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6824529Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6824661Z File "", line 1, in 2023-01-11T22:54:21.6824874Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6825018Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6825225Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6825383Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6825598Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6825704Z self.run() 2023-01-11T22:54:21.6825890Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6826037Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6826384Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6826518Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6826885Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6827011Z getattr(self, test_name)() 2023-01-11T22:54:21.6827370Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6827471Z fn() 2023-01-11T22:54:21.6827820Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6827950Z test(self, **param_kwargs) 2023-01-11T22:54:21.6828307Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6828432Z return func(*args, **kwargs) 2023-01-11T22:54:21.6828689Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6828806Z self.run_subtests( 2023-01-11T22:54:21.6829159Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6829324Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6829670Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6829826Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6830278Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6830401Z output = model(*input) 2023-01-11T22:54:21.6830732Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6830873Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6831297Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6831483Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6831841Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6831965Z _lazy_init(state, module) 2023-01-11T22:54:21.6832324Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6832499Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6832894Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6833041Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6833380Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6833510Z return func(*args, **kwargs) 2023-01-11T22:54:21.6833888Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6833974Z p_assert( 2023-01-11T22:54:21.6834310Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6834439Z traceback.print_stack() 2023-01-11T22:54:21.6834568Z File "", line 1, in 2023-01-11T22:54:21.6834776Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6834923Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6835128Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6835261Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6835475Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6835578Z self.run() 2023-01-11T22:54:21.6835783Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6835931Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6836272Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6836405Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6836768Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6836880Z getattr(self, test_name)() 2023-01-11T22:54:21.6837244Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6837345Z fn() 2023-01-11T22:54:21.6837711Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6837838Z test(self, **param_kwargs) 2023-01-11T22:54:21.6838198Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6838324Z return func(*args, **kwargs) 2023-01-11T22:54:21.6838580Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6838677Z self.run_subtests( 2023-01-11T22:54:21.6839029Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6839259Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6839632Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6839793Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6840168Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6840292Z output = model(*input) 2023-01-11T22:54:21.6840669Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6840797Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6841185Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6841364Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6841732Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6841857Z _lazy_init(state, module) 2023-01-11T22:54:21.6842210Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6842380Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6842782Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6842909Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6843248Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6843375Z return func(*args, **kwargs) 2023-01-11T22:54:21.6843752Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6843855Z p_assert( 2023-01-11T22:54:21.6844197Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6844324Z traceback.print_stack() 2023-01-11T22:54:21.6844560Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6844781Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6844912Z File "", line 1, in 2023-01-11T22:54:21.6845125Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6845270Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6845473Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6845624Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6845841Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6845929Z self.run() 2023-01-11T22:54:21.6846133Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6846282Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6846629Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6846765Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6847127Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6847259Z getattr(self, test_name)() 2023-01-11T22:54:21.6847619Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6847701Z fn() 2023-01-11T22:54:21.6848067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6848191Z test(self, **param_kwargs) 2023-01-11T22:54:21.6848550Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6848742Z return func(*args, **kwargs) 2023-01-11T22:54:21.6848997Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6849115Z self.run_subtests( 2023-01-11T22:54:21.6849475Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6849677Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6850058Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6850214Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6850594Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6850716Z output = model(*input) 2023-01-11T22:54:21.6851047Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6851189Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6851564Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6851722Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6852095Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6852252Z _lazy_init(state, module) 2023-01-11T22:54:21.6852611Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6852782Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6853513Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6853666Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6854006Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6854114Z return func(*args, **kwargs) 2023-01-11T22:54:21.6854492Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6854597Z p_assert( 2023-01-11T22:54:21.6854936Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6855064Z traceback.print_stack() 2023-01-11T22:54:21.6855194Z File "", line 1, in 2023-01-11T22:54:21.6855406Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6855551Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6855737Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6855895Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6856110Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6856215Z self.run() 2023-01-11T22:54:21.6856418Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6856565Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6856915Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6857050Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6857395Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6857523Z getattr(self, test_name)() 2023-01-11T22:54:21.6857884Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6857983Z fn() 2023-01-11T22:54:21.6858450Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6858578Z test(self, **param_kwargs) 2023-01-11T22:54:21.6858935Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6859043Z return func(*args, **kwargs) 2023-01-11T22:54:21.6859364Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6859485Z self.run_subtests( 2023-01-11T22:54:21.6859847Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6860014Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6860385Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6860541Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6860922Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6861025Z output = model(*input) 2023-01-11T22:54:21.6861354Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6861493Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6861873Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6862049Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6862416Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6862538Z _lazy_init(state, module) 2023-01-11T22:54:21.6862892Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6863064Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6863446Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6863591Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6863932Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6864063Z return func(*args, **kwargs) 2023-01-11T22:54:21.6864445Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6864548Z p_assert( 2023-01-11T22:54:21.6864885Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6865012Z traceback.print_stack() 2023-01-11T22:54:21.6865233Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6865474Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6865606Z File "", line 1, in 2023-01-11T22:54:21.6865818Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6865963Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6866168Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6866320Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6866518Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6866624Z self.run() 2023-01-11T22:54:21.6866830Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6866978Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6867326Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6867528Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6867898Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6868025Z getattr(self, test_name)() 2023-01-11T22:54:21.6868366Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6868468Z fn() 2023-01-11T22:54:21.6868883Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6869018Z test(self, **param_kwargs) 2023-01-11T22:54:21.6869385Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6869512Z return func(*args, **kwargs) 2023-01-11T22:54:21.6869767Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6869887Z self.run_subtests( 2023-01-11T22:54:21.6870222Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6870387Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6870756Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6870915Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6871294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6871417Z output = model(*input) 2023-01-11T22:54:21.6871744Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6871882Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6872241Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6872418Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6872789Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6872910Z _lazy_init(state, module) 2023-01-11T22:54:21.6873266Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6873432Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6873828Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6873976Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6874295Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6874422Z return func(*args, **kwargs) 2023-01-11T22:54:21.6874804Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6874909Z p_assert( 2023-01-11T22:54:21.6875246Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6875375Z traceback.print_stack() 2023-01-11T22:54:21.6875504Z File "", line 1, in 2023-01-11T22:54:21.6875720Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6875847Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6876049Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6876201Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6876414Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6876519Z self.run() 2023-01-11T22:54:21.6876801Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6876951Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6877280Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6877414Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6877834Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6877963Z getattr(self, test_name)() 2023-01-11T22:54:21.6878332Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6878433Z fn() 2023-01-11T22:54:21.6878796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6878922Z test(self, **param_kwargs) 2023-01-11T22:54:21.6879259Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6879394Z return func(*args, **kwargs) 2023-01-11T22:54:21.6879647Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6879762Z self.run_subtests( 2023-01-11T22:54:21.6880118Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6880284Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6880650Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6880804Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6881160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6881282Z output = model(*input) 2023-01-11T22:54:21.6881615Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6881757Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6882134Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6882310Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6882679Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6882802Z _lazy_init(state, module) 2023-01-11T22:54:21.6883137Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6883307Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6883734Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6883886Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6884230Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6884355Z return func(*args, **kwargs) 2023-01-11T22:54:21.6884732Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6884834Z p_assert( 2023-01-11T22:54:21.6885157Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6885286Z traceback.print_stack() 2023-01-11T22:54:21.6885525Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6885762Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6885892Z File "", line 1, in 2023-01-11T22:54:21.6886105Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6886319Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6886525Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6886658Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6886872Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6886978Z self.run() 2023-01-11T22:54:21.6887267Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6887421Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6887777Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6887913Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6888275Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6888387Z getattr(self, test_name)() 2023-01-11T22:54:21.6888748Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6888846Z fn() 2023-01-11T22:54:21.6889210Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6889335Z test(self, **param_kwargs) 2023-01-11T22:54:21.6889698Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6889823Z return func(*args, **kwargs) 2023-01-11T22:54:21.6890073Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6890170Z self.run_subtests( 2023-01-11T22:54:21.6890522Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6890690Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6891054Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6891207Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6891580Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6891700Z output = model(*input) 2023-01-11T22:54:21.6892027Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6892149Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6892526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6892702Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6893302Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6893434Z _lazy_init(state, module) 2023-01-11T22:54:21.6893793Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6893963Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6894365Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6894492Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6894833Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6894961Z return func(*args, **kwargs) 2023-01-11T22:54:21.6895341Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6895446Z p_assert( 2023-01-11T22:54:21.6895882Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6896013Z traceback.print_stack() 2023-01-11T22:54:21.6896144Z File "", line 1, in 2023-01-11T22:54:21.6896338Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6896483Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6896750Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6896912Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6897129Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6897236Z self.run() 2023-01-11T22:54:21.6897439Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6897568Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6897919Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6898060Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6898428Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6898554Z getattr(self, test_name)() 2023-01-11T22:54:21.6898917Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6899025Z fn() 2023-01-11T22:54:21.6899392Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6899498Z test(self, **param_kwargs) 2023-01-11T22:54:21.6899854Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6899980Z return func(*args, **kwargs) 2023-01-11T22:54:21.6900237Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6900357Z self.run_subtests( 2023-01-11T22:54:21.6900708Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6900873Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6901240Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6901381Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6901759Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6901882Z output = model(*input) 2023-01-11T22:54:21.6902211Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6902352Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6902734Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6902914Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6903281Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6903387Z _lazy_init(state, module) 2023-01-11T22:54:21.6903748Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6903920Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6904320Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6904465Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6904804Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6905000Z return func(*args, **kwargs) 2023-01-11T22:54:21.6905385Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6905472Z p_assert( 2023-01-11T22:54:21.6905812Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6905943Z traceback.print_stack() 2023-01-11T22:54:21.6906232Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6906476Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6907237Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6907989Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6908126Z File "", line 1, in 2023-01-11T22:54:21.6908343Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6908489Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6908676Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6908830Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6909045Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6909150Z self.run() 2023-01-11T22:54:21.6909352Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6909504Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6909849Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6909966Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6910329Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6910455Z getattr(self, test_name)() 2023-01-11T22:54:21.6910820Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6910922Z fn() 2023-01-11T22:54:21.6911287Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6911413Z test(self, **param_kwargs) 2023-01-11T22:54:21.6911770Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6911882Z return func(*args, **kwargs) 2023-01-11T22:54:21.6912136Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6912254Z self.run_subtests( 2023-01-11T22:54:21.6912607Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6912770Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6913137Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6913292Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6913670Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6913773Z output = model(*input) 2023-01-11T22:54:21.6914101Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6914310Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6914689Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6914867Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6915234Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6915407Z _lazy_init(state, module) 2023-01-11T22:54:21.6915776Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6915947Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6916327Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6916473Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6916814Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6916944Z return func(*args, **kwargs) 2023-01-11T22:54:21.6917326Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6917427Z p_assert( 2023-01-11T22:54:21.6917769Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6917899Z traceback.print_stack() 2023-01-11T22:54:21.6918013Z File "", line 1, in 2023-01-11T22:54:21.6918226Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6918371Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6918574Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6918727Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6918945Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6919050Z self.run() 2023-01-11T22:54:21.6919235Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6919384Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6919730Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6919867Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6920230Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6920353Z getattr(self, test_name)() 2023-01-11T22:54:21.6920714Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6920813Z fn() 2023-01-11T22:54:21.6921162Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6921291Z test(self, **param_kwargs) 2023-01-11T22:54:21.6921650Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6921775Z return func(*args, **kwargs) 2023-01-11T22:54:21.6922029Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6922146Z self.run_subtests( 2023-01-11T22:54:21.6922499Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6922663Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6923011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6923166Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6923621Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6923744Z output = model(*input) 2023-01-11T22:54:21.6924070Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6924211Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6924636Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6924821Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6925178Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6925302Z _lazy_init(state, module) 2023-01-11T22:54:21.6925660Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6925832Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6926237Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6926382Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6926722Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6926848Z return func(*args, **kwargs) 2023-01-11T22:54:21.6927209Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6927312Z p_assert( 2023-01-11T22:54:21.6927652Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6927779Z traceback.print_stack() 2023-01-11T22:54:21.6928019Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6928254Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6928388Z File "", line 1, in 2023-01-11T22:54:21.6928599Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6928726Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6928929Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6929084Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6929299Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6929404Z self.run() 2023-01-11T22:54:21.6929608Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6929755Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6930083Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6930222Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6930585Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6930710Z getattr(self, test_name)() 2023-01-11T22:54:21.6931067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6931167Z fn() 2023-01-11T22:54:21.6931534Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6931658Z test(self, **param_kwargs) 2023-01-11T22:54:21.6931997Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6932123Z return func(*args, **kwargs) 2023-01-11T22:54:21.6932376Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6932490Z self.run_subtests( 2023-01-11T22:54:21.6933130Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6933300Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6933673Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6933828Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6934260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6934392Z output = model(*input) 2023-01-11T22:54:21.6934733Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6934876Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6935253Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6935434Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6935800Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6935923Z _lazy_init(state, module) 2023-01-11T22:54:21.6936259Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6936433Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6936833Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6936978Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6937317Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6937444Z return func(*args, **kwargs) 2023-01-11T22:54:21.6937823Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6937931Z p_assert( 2023-01-11T22:54:21.6938252Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6938380Z traceback.print_stack() 2023-01-11T22:54:21.6938510Z File "", line 1, in 2023-01-11T22:54:21.6938727Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6938871Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6939075Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6939229Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6939444Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6939531Z self.run() 2023-01-11T22:54:21.6939735Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6939885Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6940223Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6940358Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6940720Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6940849Z getattr(self, test_name)() 2023-01-11T22:54:21.6941209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6941291Z fn() 2023-01-11T22:54:21.6941656Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6941780Z test(self, **param_kwargs) 2023-01-11T22:54:21.6942136Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6942359Z return func(*args, **kwargs) 2023-01-11T22:54:21.6942616Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6942733Z self.run_subtests( 2023-01-11T22:54:21.6943073Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6943298Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6943684Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6943842Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6944222Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6944346Z output = model(*input) 2023-01-11T22:54:21.6944673Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6944819Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6945201Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6945358Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6945728Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6945851Z _lazy_init(state, module) 2023-01-11T22:54:21.6946207Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6946376Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6946774Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6946916Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6947257Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6947365Z return func(*args, **kwargs) 2023-01-11T22:54:21.6947739Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6947848Z p_assert( 2023-01-11T22:54:21.6948192Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6948319Z traceback.print_stack() 2023-01-11T22:54:21.6948555Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6948791Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6948903Z File "", line 1, in 2023-01-11T22:54:21.6949116Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6949262Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6949463Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6949614Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6949827Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6949930Z self.run() 2023-01-11T22:54:21.6950134Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6950265Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6950610Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6950743Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6951105Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6951228Z getattr(self, test_name)() 2023-01-11T22:54:21.6951662Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6951763Z fn() 2023-01-11T22:54:21.6952126Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6952232Z test(self, **param_kwargs) 2023-01-11T22:54:21.6952639Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6952771Z return func(*args, **kwargs) 2023-01-11T22:54:21.6953025Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6953141Z self.run_subtests( 2023-01-11T22:54:21.6953500Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6953663Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6954034Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6954170Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6954546Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6954667Z output = model(*input) 2023-01-11T22:54:21.6963190Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6963384Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6963823Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6964004Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6964383Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6964512Z _lazy_init(state, module) 2023-01-11T22:54:21.6964855Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6965028Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6965433Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6965584Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6965930Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6966062Z return func(*args, **kwargs) 2023-01-11T22:54:21.6966442Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6966549Z p_assert( 2023-01-11T22:54:21.6966872Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6967007Z traceback.print_stack() 2023-01-11T22:54:21.6967139Z File "", line 1, in 2023-01-11T22:54:21.6967354Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6967501Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6967704Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6967860Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6968057Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6968163Z self.run() 2023-01-11T22:54:21.6968366Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6968512Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6968854Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6969113Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6969488Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6969614Z getattr(self, test_name)() 2023-01-11T22:54:21.6969958Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6970056Z fn() 2023-01-11T22:54:21.6970483Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6970614Z test(self, **param_kwargs) 2023-01-11T22:54:21.6970979Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6971104Z return func(*args, **kwargs) 2023-01-11T22:54:21.6971365Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6971485Z self.run_subtests( 2023-01-11T22:54:21.6971820Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6971986Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6972351Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6972504Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6973108Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6973244Z output = model(*input) 2023-01-11T22:54:21.6973582Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6973723Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6974083Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6974265Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6974639Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6974763Z _lazy_init(state, module) 2023-01-11T22:54:21.6975117Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6975292Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6975689Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6975833Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6976155Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6976282Z return func(*args, **kwargs) 2023-01-11T22:54:21.6976667Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6976771Z p_assert( 2023-01-11T22:54:21.6977108Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6977237Z traceback.print_stack() 2023-01-11T22:54:21.6977478Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6977723Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6977836Z File "", line 1, in 2023-01-11T22:54:21.6978050Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6978196Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6978402Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6978556Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6978884Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6978992Z self.run() 2023-01-11T22:54:21.6979196Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6979328Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6979684Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6979886Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6980268Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6980396Z getattr(self, test_name)() 2023-01-11T22:54:21.6980758Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6980861Z fn() 2023-01-11T22:54:21.6981206Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6981338Z test(self, **param_kwargs) 2023-01-11T22:54:21.6981696Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6981824Z return func(*args, **kwargs) 2023-01-11T22:54:21.6982078Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6982198Z self.run_subtests( 2023-01-11T22:54:21.6982551Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6982717Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6983066Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6983223Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6983609Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6983731Z output = model(*input) 2023-01-11T22:54:21.6984058Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6984197Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6984609Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6984787Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6984900Z File "", line 1, in 2023-01-11T22:54:21.6985270Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6985394Z _lazy_init(state, module) 2023-01-11T22:54:21.6985748Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6985924Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6986138Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.6986283Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.6986684Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6986813Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6987017Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.6987170Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.6987510Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6987638Z return func(*args, **kwargs) 2023-01-11T22:54:21.6987854Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.6988024Z self.run() 2023-01-11T22:54:21.6988408Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6988494Z p_assert( 2023-01-11T22:54:21.6988703Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.6988855Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.6989239Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6989372Z traceback.print_stack() 2023-01-11T22:54:21.6989716Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.6989852Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.6990216Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.6990324Z getattr(self, test_name)() 2023-01-11T22:54:21.6990697Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.6990798Z fn() 2023-01-11T22:54:21.6991168Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.6991292Z test(self, **param_kwargs) 2023-01-11T22:54:21.6991650Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.6991778Z return func(*args, **kwargs) 2023-01-11T22:54:21.6992033Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.6992130Z self.run_subtests( 2023-01-11T22:54:21.6992484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.6992647Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.6993016Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.6993169Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.6993546Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.6993668Z output = model(*input) 2023-01-11T22:54:21.6993997Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.6994121Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.6994502Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.6994677Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.6995048Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.6995175Z _lazy_init(state, module) 2023-01-11T22:54:21.6995533Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.6995703Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.6996103Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.6996231Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.6996572Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.6996700Z return func(*args, **kwargs) 2023-01-11T22:54:21.6997082Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.6997187Z p_assert( 2023-01-11T22:54:21.6997526Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.6997714Z traceback.print_stack() 2023-01-11T22:54:21.6997954Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6998174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.6998988Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.6999751Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7000501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7001247Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7001984Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7002724Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7003460Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7004199Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7004336Z File "", line 1, in 2023-01-11T22:54:21.7004555Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7004700Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7004907Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7005060Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7005279Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7005368Z self.run() 2023-01-11T22:54:21.7005575Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7005724Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7006071Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7006208Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7006574Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7006757Z getattr(self, test_name)() 2023-01-11T22:54:21.7007126Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7007209Z fn() 2023-01-11T22:54:21.7007577Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7007754Z test(self, **param_kwargs) 2023-01-11T22:54:21.7008126Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7008254Z return func(*args, **kwargs) 2023-01-11T22:54:21.7008507Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.7008624Z self.run_subtests( 2023-01-11T22:54:21.7008960Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7009133Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7009499Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7009654Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7010032Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7010156Z output = model(*input) 2023-01-11T22:54:21.7010485Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7010626Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7011007Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7011166Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7011544Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7011668Z _lazy_init(state, module) 2023-01-11T22:54:21.7012028Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7012199Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7012602Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7012749Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7013337Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7013448Z return func(*args, **kwargs) 2023-01-11T22:54:21.7013836Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7013947Z p_assert( 2023-01-11T22:54:21.7014288Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7014419Z traceback.print_stack() 2023-01-11T22:54:21.7014552Z File "", line 1, in 2023-01-11T22:54:21.7014769Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7014899Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7015107Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7015260Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7015475Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7015581Z self.run() 2023-01-11T22:54:21.7015787Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7015937Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7016399Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7016516Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7016882Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7017009Z getattr(self, test_name)() 2023-01-11T22:54:21.7017433Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7017541Z fn() 2023-01-11T22:54:21.7017916Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7018043Z test(self, **param_kwargs) 2023-01-11T22:54:21.7018401Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7018508Z return func(*args, **kwargs) 2023-01-11T22:54:21.7018772Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 107, in test_nested_wrapped_model 2023-01-11T22:54:21.7018889Z self.run_subtests( 2023-01-11T22:54:21.7019243Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7019410Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7019778Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7019933Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7020311Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7020413Z output = model(*input) 2023-01-11T22:54:21.7020741Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7020885Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7021269Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7021448Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7021818Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7021944Z _lazy_init(state, module) 2023-01-11T22:54:21.7022303Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7022455Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7022854Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7023000Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7023343Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7023475Z return func(*args, **kwargs) 2023-01-11T22:54:21.7023859Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7023964Z p_assert( 2023-01-11T22:54:21.7024306Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7024420Z traceback.print_stack() 2023-01-11T22:54:21.7024660Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7024900Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7025135Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7025370Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7025664Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7025893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7026123Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7026333Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7026605Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7026839Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7027069Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7027296Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7027525Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7027757Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7027985Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7028210Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7028420Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7028651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7028877Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7029102Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7029331Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7029557Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7029788Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7030014Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7030787Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7031519Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7032264Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7033011Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7033746Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7034480Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7035340Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7036086Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7036818Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7037556Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7038284Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7039011Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7039747Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7040479Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7041210Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7041949Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7042678Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7043404Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7044284Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7045023Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7045262Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7045504Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7045739Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7045973Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7046204Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7046440Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7046669Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7046880Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7047110Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7047339Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7047570Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7047799Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7048029Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7048260Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7048488Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7048695Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7048922Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7049148Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7049376Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7049603Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7049829Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7050055Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7050282Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7050505Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.7051528Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.7051700Z world_indices[ 2023-01-11T22:54:21.7052497Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7053593Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7054343Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7055092Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7055826Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7056555Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7056675Z dist init r=1, world=2 2023-01-11T22:54:21.7057010Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7057332Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7057643Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7057950Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7058259Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7058560Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7058865Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7059162Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7059445Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7059745Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7060136Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7060437Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7060602Z dist init r=0, world=2 2023-01-11T22:54:21.7060932Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7061246Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7061555Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7061865Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7062170Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7062476Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7062761Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7063064Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7063369Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7063672Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7063978Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7064280Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7064385Z ok (5.715s) 2023-01-11T22:54:21.7064768Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88947 2023-01-11T22:54:21.7064993Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88948 2023-01-11T22:54:21.7065388Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.7065566Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.7065938Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.7066134Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.7066507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.7066686Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.7067064Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.7067316Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.7067563Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.7067809Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.7068247Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.7068662Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.7068893Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.7069124Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.7070149Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.7070269Z warnings.warn( 2023-01-11T22:54:21.7071292Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.7071407Z warnings.warn( 2023-01-11T22:54:21.7072159Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7072912Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7073651Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7074390Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7075135Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7075870Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7076602Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7077459Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7078202Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7078937Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7079674Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7080405Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7081133Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7081870Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7082597Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7083324Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7084056Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7084810Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7085540Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7086373Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7087117Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7087847Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7088580Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7089309Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7090038Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7090773Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7091500Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7092227Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7093109Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7093851Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7094579Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7095454Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7096200Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7096928Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7097661Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7098391Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7099119Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7099852Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7100577Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7101304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7102016Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7102745Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7103474Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7104313Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7105050Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7105778Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7106511Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7107238Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7107963Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7108698Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7109427Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7110153Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7110884Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7111613Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7112335Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7113166Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7113902Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7114627Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7115360Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7116084Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7116808Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7117541Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7118265Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7119021Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7119751Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7120482Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7121203Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7122035Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7122771Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7123495Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7124227Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7124952Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7125674Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7126403Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7127124Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7127849Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7128581Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7129308Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7130033Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7130815Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7131579Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7132314Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7133244Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7133986Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7134710Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7135439Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7136166Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7136891Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7137618Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7138346Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7139067Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7139895Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7140675Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7141417Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7142149Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7142877Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7143604Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7144334Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7145059Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7145784Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7145904Z dist init r=0, world=2 2023-01-11T22:54:21.7146018Z dist init r=1, world=2 2023-01-11T22:54:21.7146121Z ok (4.613s) 2023-01-11T22:54:21.7146481Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89030 2023-01-11T22:54:21.7146704Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89031 2023-01-11T22:54:21.7147083Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.7147263Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.7147654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.7147916Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.7148294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.7148475Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.7148854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.7149078Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.7149333Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.7149580Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.7149992Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.7150388Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.7150628Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.7150860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.7151887Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.7152006Z warnings.warn( 2023-01-11T22:54:21.7153060Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.7153181Z warnings.warn( 2023-01-11T22:54:21.7153918Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7154658Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7155403Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7156145Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7156879Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7157687Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7158460Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7159204Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7159939Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7160672Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7161402Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7162138Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7162867Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7163599Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7164335Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7165069Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7165801Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7166599Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7167377Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7168117Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7168846Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7169577Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7170304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7171032Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7171760Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7172489Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7173403Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7174145Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7174263Z dist init r=1, world=2 2023-01-11T22:54:21.7174376Z dist init r=0, world=2 2023-01-11T22:54:21.7174481Z ok (4.714s) 2023-01-11T22:54:21.7174873Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89113 2023-01-11T22:54:21.7175179Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89114 2023-01-11T22:54:21.7175560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.7175720Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.7176201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.7176405Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.7176782Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.7176964Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.7177344Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.7177544Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.7177793Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.7178020Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.7178429Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.7178830Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.7179061Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.7179289Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.7180314Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.7180437Z warnings.warn( 2023-01-11T22:54:21.7181456Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.7181571Z warnings.warn( 2023-01-11T22:54:21.7182324Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7183079Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7183818Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7184558Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7185432Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7186178Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7186911Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7187651Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7188383Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7189111Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7189845Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7190573Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7191307Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7192044Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7192770Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7193499Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7194340Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7195075Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7195806Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7196541Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7197271Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7197999Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7198733Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7199461Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7200191Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7200928Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7201656Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7202382Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7202552Z dist init r=0, world=2 2023-01-11T22:54:21.7202666Z dist init r=1, world=2 2023-01-11T22:54:21.7202769Z ok (4.813s) 2023-01-11T22:54:21.7203195Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89196 2023-01-11T22:54:21.7203423Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89197 2023-01-11T22:54:21.7203803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.7203982Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.7204363Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.7204542Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.7204913Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.7205090Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.7205472Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.7205665Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.7205912Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.7206160Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.7206562Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.7206964Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.7207177Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.7207407Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.7208435Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.7208554Z warnings.warn( 2023-01-11T22:54:21.7209577Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.7209694Z warnings.warn( 2023-01-11T22:54:21.7209829Z File "", line 1, in 2023-01-11T22:54:21.7210045Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7210192Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7210399Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7210535Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7210753Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7210922Z self.run() 2023-01-11T22:54:21.7211132Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7211283Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7211635Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7211774Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7212196Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7212310Z getattr(self, test_name)() 2023-01-11T22:54:21.7212681Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7212783Z fn() 2023-01-11T22:54:21.7213311Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7213444Z test(self, **param_kwargs) 2023-01-11T22:54:21.7213807Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7213935Z return func(*args, **kwargs) 2023-01-11T22:54:21.7214235Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7214333Z self.run_subtests( 2023-01-11T22:54:21.7214692Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7214857Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7215227Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7215384Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7215762Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7215889Z output = model(*input) 2023-01-11T22:54:21.7216217Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7216339Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7216717Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7216899Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7217271Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7217395Z _lazy_init(state, module) 2023-01-11T22:54:21.7217750Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7217921Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7218328Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7218455Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7218794Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7218922Z return func(*args, **kwargs) 2023-01-11T22:54:21.7219306Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7219411Z p_assert( 2023-01-11T22:54:21.7219751Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7219882Z traceback.print_stack() 2023-01-11T22:54:21.7220014Z File "", line 1, in 2023-01-11T22:54:21.7220206Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7220438Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7220642Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7220794Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7221010Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7221117Z self.run() 2023-01-11T22:54:21.7221325Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7221531Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7221873Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7222009Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7222373Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7222503Z getattr(self, test_name)() 2023-01-11T22:54:21.7222867Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7222971Z fn() 2023-01-11T22:54:21.7223338Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7223444Z test(self, **param_kwargs) 2023-01-11T22:54:21.7223805Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7223935Z return func(*args, **kwargs) 2023-01-11T22:54:21.7224233Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7224349Z self.run_subtests( 2023-01-11T22:54:21.7224706Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7224872Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7225243Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7225398Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7225757Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7225880Z output = model(*input) 2023-01-11T22:54:21.7226213Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7226356Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7226732Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7226911Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7227278Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7227406Z _lazy_init(state, module) 2023-01-11T22:54:21.7227746Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7227917Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7228321Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7228470Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7228812Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7228939Z return func(*args, **kwargs) 2023-01-11T22:54:21.7229316Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7229421Z p_assert( 2023-01-11T22:54:21.7229739Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7229933Z traceback.print_stack() 2023-01-11T22:54:21.7230068Z File "", line 1, in 2023-01-11T22:54:21.7230282Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7230428Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7230634Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7230828Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7231032Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7231139Z self.run() 2023-01-11T22:54:21.7231345Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7231494Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7231842Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7231982Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7232345Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7232471Z getattr(self, test_name)() 2023-01-11T22:54:21.7232815Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7232916Z fn() 2023-01-11T22:54:21.7233286Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7233412Z test(self, **param_kwargs) 2023-01-11T22:54:21.7233771Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7233899Z return func(*args, **kwargs) 2023-01-11T22:54:21.7234199Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7234320Z self.run_subtests( 2023-01-11T22:54:21.7234657Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7234821Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7235189Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7235347Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7235724Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7235847Z output = model(*input) 2023-01-11T22:54:21.7236174Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7236320Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7236682Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7236864Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7237233Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7237357Z _lazy_init(state, module) 2023-01-11T22:54:21.7237717Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7237887Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7238287Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7238434Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7238754Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7238884Z return func(*args, **kwargs) 2023-01-11T22:54:21.7239335Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7239442Z p_assert( 2023-01-11T22:54:21.7239782Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7239912Z traceback.print_stack() 2023-01-11T22:54:21.7240045Z File "", line 1, in 2023-01-11T22:54:21.7240305Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7240439Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7240644Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7240799Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7241014Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7241121Z self.run() 2023-01-11T22:54:21.7241331Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7241481Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7241829Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7241946Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7242312Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7242441Z getattr(self, test_name)() 2023-01-11T22:54:21.7242807Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7242907Z fn() 2023-01-11T22:54:21.7243274Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7243401Z test(self, **param_kwargs) 2023-01-11T22:54:21.7243741Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7243874Z return func(*args, **kwargs) 2023-01-11T22:54:21.7244172Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7244289Z self.run_subtests( 2023-01-11T22:54:21.7244648Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7244816Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7245182Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7245339Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7245716Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7245819Z output = model(*input) 2023-01-11T22:54:21.7246154Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7246295Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7246674Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7246851Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7247227Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7247348Z _lazy_init(state, module) 2023-01-11T22:54:21.7247703Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7247855Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7248254Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7248471Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7248818Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7248945Z return func(*args, **kwargs) 2023-01-11T22:54:21.7249322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7249474Z p_assert( 2023-01-11T22:54:21.7249824Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7249935Z traceback.print_stack() 2023-01-11T22:54:21.7250066Z File "", line 1, in 2023-01-11T22:54:21.7250277Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7250423Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7250629Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7250787Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7251003Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7251092Z self.run() 2023-01-11T22:54:21.7251298Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7251447Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7251792Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7251926Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7252291Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7252414Z getattr(self, test_name)() 2023-01-11T22:54:21.7252780Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7253021Z fn() 2023-01-11T22:54:21.7253409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7253534Z test(self, **param_kwargs) 2023-01-11T22:54:21.7253895Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7254023Z return func(*args, **kwargs) 2023-01-11T22:54:21.7254327Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7254444Z self.run_subtests( 2023-01-11T22:54:21.7254796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7254943Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7255311Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7255472Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7255851Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7255974Z output = model(*input) 2023-01-11T22:54:21.7256304Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7256447Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7256830Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7256990Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7257357Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7257481Z _lazy_init(state, module) 2023-01-11T22:54:21.7257835Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7258093Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7258499Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7258645Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7259043Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7259163Z return func(*args, **kwargs) 2023-01-11T22:54:21.7259554Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7259660Z p_assert( 2023-01-11T22:54:21.7259999Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7260130Z traceback.print_stack() 2023-01-11T22:54:21.7260268Z File "", line 1, in 2023-01-11T22:54:21.7260479Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7260624Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7260811Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7260963Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7261181Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7261289Z self.run() 2023-01-11T22:54:21.7261496Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7261643Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7261987Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7262120Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7262467Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7262598Z getattr(self, test_name)() 2023-01-11T22:54:21.7262962Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7263063Z fn() 2023-01-11T22:54:21.7263431Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7263560Z test(self, **param_kwargs) 2023-01-11T22:54:21.7263919Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7264028Z return func(*args, **kwargs) 2023-01-11T22:54:21.7264327Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7264445Z self.run_subtests( 2023-01-11T22:54:21.7264801Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7264972Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7265336Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7265492Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7265871Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7265994Z output = model(*input) 2023-01-11T22:54:21.7266304Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7266440Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7266817Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7266993Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7267445Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7267570Z _lazy_init(state, module) 2023-01-11T22:54:21.7267926Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7268097Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7268522Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7268675Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7269025Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7269154Z return func(*args, **kwargs) 2023-01-11T22:54:21.7269534Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7269645Z p_assert( 2023-01-11T22:54:21.7269985Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7270114Z traceback.print_stack() 2023-01-11T22:54:21.7270854Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7271606Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7271742Z File "", line 1, in 2023-01-11T22:54:21.7271960Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7272105Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7272310Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7272464Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7272683Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7272793Z self.run() 2023-01-11T22:54:21.7272980Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7273127Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7273473Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7273610Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7273971Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7274101Z getattr(self, test_name)() 2023-01-11T22:54:21.7274462Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7274561Z fn() 2023-01-11T22:54:21.7274908Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7275035Z test(self, **param_kwargs) 2023-01-11T22:54:21.7275395Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7275524Z return func(*args, **kwargs) 2023-01-11T22:54:21.7275823Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7275940Z self.run_subtests( 2023-01-11T22:54:21.7276297Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7276520Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7276876Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7277029Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7277451Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7277577Z output = model(*input) 2023-01-11T22:54:21.7277909Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7278050Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7278426Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7278605Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7278954Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7279078Z _lazy_init(state, module) 2023-01-11T22:54:21.7279432Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7279602Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7280003Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7280151Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7280491Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7280619Z return func(*args, **kwargs) 2023-01-11T22:54:21.7280979Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7281089Z p_assert( 2023-01-11T22:54:21.7281429Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7281557Z traceback.print_stack() 2023-01-11T22:54:21.7281689Z File "", line 1, in 2023-01-11T22:54:21.7281901Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7282044Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7282251Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7282384Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7282600Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7282707Z self.run() 2023-01-11T22:54:21.7282913Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7283061Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7283407Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7283543Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7283888Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7284013Z getattr(self, test_name)() 2023-01-11T22:54:21.7284379Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7284480Z fn() 2023-01-11T22:54:21.7284845Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7284972Z test(self, **param_kwargs) 2023-01-11T22:54:21.7285329Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7285458Z return func(*args, **kwargs) 2023-01-11T22:54:21.7285824Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7285941Z self.run_subtests( 2023-01-11T22:54:21.7286301Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7286467Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7286883Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7287045Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7287425Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7287552Z output = model(*input) 2023-01-11T22:54:21.7287861Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7288009Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7288389Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7288568Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7288939Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7289065Z _lazy_init(state, module) 2023-01-11T22:54:21.7289424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7289594Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7289991Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7290117Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7290455Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7290587Z return func(*args, **kwargs) 2023-01-11T22:54:21.7290965Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7291072Z p_assert( 2023-01-11T22:54:21.7291413Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7291544Z traceback.print_stack() 2023-01-11T22:54:21.7291659Z File "", line 1, in 2023-01-11T22:54:21.7291871Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7292017Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7292222Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7292377Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7292593Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7292705Z self.run() 2023-01-11T22:54:21.7293097Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7293237Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7293590Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7293726Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7294097Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7294223Z getattr(self, test_name)() 2023-01-11T22:54:21.7294586Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7294688Z fn() 2023-01-11T22:54:21.7295058Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7295253Z test(self, **param_kwargs) 2023-01-11T22:54:21.7295621Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7295748Z return func(*args, **kwargs) 2023-01-11T22:54:21.7296047Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7296221Z self.run_subtests( 2023-01-11T22:54:21.7296592Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7296761Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7297126Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7297263Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7297639Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7297766Z output = model(*input) 2023-01-11T22:54:21.7298094Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7298234Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7298616Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7298794Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7299162Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7299267Z _lazy_init(state, module) 2023-01-11T22:54:21.7299622Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7299794Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7300199Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7300344Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7300682Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7300808Z return func(*args, **kwargs) 2023-01-11T22:54:21.7301190Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7301277Z p_assert( 2023-01-11T22:54:21.7301614Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7301744Z traceback.print_stack() 2023-01-11T22:54:21.7301873Z File "", line 1, in 2023-01-11T22:54:21.7302085Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7302234Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7302442Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7302596Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7302792Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7302899Z self.run() 2023-01-11T22:54:21.7303106Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7303254Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7303598Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7303735Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7304101Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7304208Z getattr(self, test_name)() 2023-01-11T22:54:21.7304649Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7304747Z fn() 2023-01-11T22:54:21.7305117Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7305245Z test(self, **param_kwargs) 2023-01-11T22:54:21.7305652Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7305786Z return func(*args, **kwargs) 2023-01-11T22:54:21.7306085Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7306182Z self.run_subtests( 2023-01-11T22:54:21.7306544Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7306708Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7307080Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7307235Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7307609Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7307731Z output = model(*input) 2023-01-11T22:54:21.7308063Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7308185Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7308565Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7308744Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7309111Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7309239Z _lazy_init(state, module) 2023-01-11T22:54:21.7309595Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7309766Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7310163Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7310314Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7310637Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7310764Z return func(*args, **kwargs) 2023-01-11T22:54:21.7311140Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7311246Z p_assert( 2023-01-11T22:54:21.7311583Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7311714Z traceback.print_stack() 2023-01-11T22:54:21.7311846Z File "", line 1, in 2023-01-11T22:54:21.7312039Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7312183Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7312388Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7312548Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7312761Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7312867Z self.run() 2023-01-11T22:54:21.7313071Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7313219Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7313540Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7313739Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7314106Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7314229Z getattr(self, test_name)() 2023-01-11T22:54:21.7314592Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7314692Z fn() 2023-01-11T22:54:21.7315105Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7315237Z test(self, **param_kwargs) 2023-01-11T22:54:21.7315583Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7315711Z return func(*args, **kwargs) 2023-01-11T22:54:21.7316008Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7316128Z self.run_subtests( 2023-01-11T22:54:21.7316482Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7316646Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7317012Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7317171Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7317532Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7317655Z output = model(*input) 2023-01-11T22:54:21.7317983Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7318124Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7318501Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7318683Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7319051Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7319174Z _lazy_init(state, module) 2023-01-11T22:54:21.7319512Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7319680Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7320078Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7320226Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7320565Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7320697Z return func(*args, **kwargs) 2023-01-11T22:54:21.7321076Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7321181Z p_assert( 2023-01-11T22:54:21.7321498Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7321629Z traceback.print_stack() 2023-01-11T22:54:21.7321764Z File "", line 1, in 2023-01-11T22:54:21.7321975Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7322118Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7322324Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7322479Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7322693Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7322882Z self.run() 2023-01-11T22:54:21.7323088Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7323236Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7323583Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7323719Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7324138Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7324271Z getattr(self, test_name)() 2023-01-11T22:54:21.7324622Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7324722Z fn() 2023-01-11T22:54:21.7325091Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7325218Z test(self, **param_kwargs) 2023-01-11T22:54:21.7325586Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7325713Z return func(*args, **kwargs) 2023-01-11T22:54:21.7326011Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7326128Z self.run_subtests( 2023-01-11T22:54:21.7326466Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7326630Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7326996Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7327152Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7327531Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7327657Z output = model(*input) 2023-01-11T22:54:21.7327985Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7328126Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7328484Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7328665Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7329037Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7329163Z _lazy_init(state, module) 2023-01-11T22:54:21.7329521Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7329690Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7330091Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7330240Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7330580Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7330688Z return func(*args, **kwargs) 2023-01-11T22:54:21.7331068Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7331173Z p_assert( 2023-01-11T22:54:21.7331511Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7331642Z traceback.print_stack() 2023-01-11T22:54:21.7331775Z File "", line 1, in 2023-01-11T22:54:21.7331993Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7332119Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7332401Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7332554Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7332769Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7333022Z self.run() 2023-01-11T22:54:21.7333235Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7333385Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7333813Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7333940Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7334311Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7334438Z getattr(self, test_name)() 2023-01-11T22:54:21.7334804Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7334909Z fn() 2023-01-11T22:54:21.7335278Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7335405Z test(self, **param_kwargs) 2023-01-11T22:54:21.7335761Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7335870Z return func(*args, **kwargs) 2023-01-11T22:54:21.7336173Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7336290Z self.run_subtests( 2023-01-11T22:54:21.7336645Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7336812Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7337180Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7337341Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7337720Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7337823Z output = model(*input) 2023-01-11T22:54:21.7338151Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7338297Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7338677Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7338855Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7339222Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7339346Z _lazy_init(state, module) 2023-01-11T22:54:21.7339704Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7339856Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7340258Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7340407Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7340754Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7340883Z return func(*args, **kwargs) 2023-01-11T22:54:21.7341262Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7341366Z p_assert( 2023-01-11T22:54:21.7341703Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7341894Z traceback.print_stack() 2023-01-11T22:54:21.7342027Z File "", line 1, in 2023-01-11T22:54:21.7342240Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7342391Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7342597Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7342751Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7343009Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7343119Z self.run() 2023-01-11T22:54:21.7343307Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7343458Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7343808Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7343944Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7344313Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7344438Z getattr(self, test_name)() 2023-01-11T22:54:21.7344801Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7344882Z fn() 2023-01-11T22:54:21.7345255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7345381Z test(self, **param_kwargs) 2023-01-11T22:54:21.7345740Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7345866Z return func(*args, **kwargs) 2023-01-11T22:54:21.7346160Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7346277Z self.run_subtests( 2023-01-11T22:54:21.7346636Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7346782Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7347148Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7347302Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7347682Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7347803Z output = model(*input) 2023-01-11T22:54:21.7348131Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7348272Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7348650Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7348831Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7349182Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7349306Z _lazy_init(state, module) 2023-01-11T22:54:21.7349663Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7349838Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7350238Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7350384Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7350720Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7350847Z return func(*args, **kwargs) 2023-01-11T22:54:21.7351207Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7351377Z p_assert( 2023-01-11T22:54:21.7351725Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7351853Z traceback.print_stack() 2023-01-11T22:54:21.7351983Z File "", line 1, in 2023-01-11T22:54:21.7352244Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7352397Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7352582Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7352737Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7352954Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7353061Z self.run() 2023-01-11T22:54:21.7353267Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7353420Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7353765Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7353900Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7354245Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7354370Z getattr(self, test_name)() 2023-01-11T22:54:21.7354737Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7354840Z fn() 2023-01-11T22:54:21.7355207Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7355332Z test(self, **param_kwargs) 2023-01-11T22:54:21.7355690Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7355820Z return func(*args, **kwargs) 2023-01-11T22:54:21.7356098Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7356216Z self.run_subtests( 2023-01-11T22:54:21.7356572Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7356738Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7357108Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7357264Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7357642Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7357764Z output = model(*input) 2023-01-11T22:54:21.7358071Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7358216Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7358598Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7358775Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7359146Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7359269Z _lazy_init(state, module) 2023-01-11T22:54:21.7359623Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7359791Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7360172Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7360379Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7360726Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7360853Z return func(*args, **kwargs) 2023-01-11T22:54:21.7361232Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7361341Z p_assert( 2023-01-11T22:54:21.7361726Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7361860Z traceback.print_stack() 2023-01-11T22:54:21.7361973Z File "", line 1, in 2023-01-11T22:54:21.7362187Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7362332Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7362537Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7362695Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7362910Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7363017Z self.run() 2023-01-11T22:54:21.7363204Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7363352Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7363704Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7363839Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7364206Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7364332Z getattr(self, test_name)() 2023-01-11T22:54:21.7364691Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7364793Z fn() 2023-01-11T22:54:21.7365140Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7365269Z test(self, **param_kwargs) 2023-01-11T22:54:21.7365632Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7365760Z return func(*args, **kwargs) 2023-01-11T22:54:21.7366064Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7366180Z self.run_subtests( 2023-01-11T22:54:21.7366536Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7366701Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7367050Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7367205Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7367585Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7367707Z output = model(*input) 2023-01-11T22:54:21.7368037Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7368178Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7368556Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7368733Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7369100Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7369205Z _lazy_init(state, module) 2023-01-11T22:54:21.7369560Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7369789Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7370196Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7370340Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7370675Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7370848Z return func(*args, **kwargs) 2023-01-11T22:54:21.7371240Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7371327Z p_assert( 2023-01-11T22:54:21.7371669Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7371798Z traceback.print_stack() 2023-01-11T22:54:21.7371933Z File "", line 1, in 2023-01-11T22:54:21.7372152Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7372298Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7372503Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7372637Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7372990Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7373112Z self.run() 2023-01-11T22:54:21.7373323Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7373474Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7373822Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7373957Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7374318Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7374429Z getattr(self, test_name)() 2023-01-11T22:54:21.7374790Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7374891Z fn() 2023-01-11T22:54:21.7375259Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7375384Z test(self, **param_kwargs) 2023-01-11T22:54:21.7375745Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7375871Z return func(*args, **kwargs) 2023-01-11T22:54:21.7376171Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7376269Z self.run_subtests( 2023-01-11T22:54:21.7376625Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7376793Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7377160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7377316Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7377690Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7377816Z output = model(*input) 2023-01-11T22:54:21.7378142Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7378263Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7378639Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7378816Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7379281Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7379406Z _lazy_init(state, module) 2023-01-11T22:54:21.7379762Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7379934Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7380395Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7380530Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7380880Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7381005Z return func(*args, **kwargs) 2023-01-11T22:54:21.7381384Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7381494Z p_assert( 2023-01-11T22:54:21.7381835Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7381964Z traceback.print_stack() 2023-01-11T22:54:21.7382097Z File "", line 1, in 2023-01-11T22:54:21.7382290Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7382438Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7382647Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7382802Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7383015Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7383122Z self.run() 2023-01-11T22:54:21.7383327Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7383456Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7383806Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7383940Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7384307Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7384431Z getattr(self, test_name)() 2023-01-11T22:54:21.7384798Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7384899Z fn() 2023-01-11T22:54:21.7385264Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7385371Z test(self, **param_kwargs) 2023-01-11T22:54:21.7385729Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7385858Z return func(*args, **kwargs) 2023-01-11T22:54:21.7386152Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7386270Z self.run_subtests( 2023-01-11T22:54:21.7386651Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7386817Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7387184Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7387320Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7387696Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7387817Z output = model(*input) 2023-01-11T22:54:21.7388146Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7388286Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7388732Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7388910Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7389281Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7389404Z _lazy_init(state, module) 2023-01-11T22:54:21.7389788Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7389963Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7390364Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7390511Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7390847Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7390980Z return func(*args, **kwargs) 2023-01-11T22:54:21.7391358Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7391464Z p_assert( 2023-01-11T22:54:21.7391782Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7391915Z traceback.print_stack() 2023-01-11T22:54:21.7392669Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7393417Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7393556Z File "", line 1, in 2023-01-11T22:54:21.7393770Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7393914Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7394123Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7394274Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7394469Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7394573Z self.run() 2023-01-11T22:54:21.7394779Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7394929Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7395276Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7395415Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7395779Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7395904Z getattr(self, test_name)() 2023-01-11T22:54:21.7396245Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7396350Z fn() 2023-01-11T22:54:21.7396717Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7396843Z test(self, **param_kwargs) 2023-01-11T22:54:21.7397199Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7397327Z return func(*args, **kwargs) 2023-01-11T22:54:21.7397628Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7397818Z self.run_subtests( 2023-01-11T22:54:21.7398162Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7398325Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7398745Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7398906Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7399289Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7399412Z output = model(*input) 2023-01-11T22:54:21.7399740Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7399880Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7400244Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7400422Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7400793Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7400916Z _lazy_init(state, module) 2023-01-11T22:54:21.7401274Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7401446Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7401843Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7401988Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7402310Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7402446Z return func(*args, **kwargs) 2023-01-11T22:54:21.7402826Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7402931Z p_assert( 2023-01-11T22:54:21.7403272Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7403400Z traceback.print_stack() 2023-01-11T22:54:21.7403538Z File "", line 1, in 2023-01-11T22:54:21.7403750Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7403877Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7404083Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7404236Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7404452Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7404563Z self.run() 2023-01-11T22:54:21.7404768Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7404916Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7405242Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7405380Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7405746Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7405873Z getattr(self, test_name)() 2023-01-11T22:54:21.7406237Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7406339Z fn() 2023-01-11T22:54:21.7406709Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7406835Z test(self, **param_kwargs) 2023-01-11T22:54:21.7407245Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7407373Z return func(*args, **kwargs) 2023-01-11T22:54:21.7407671Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7407791Z self.run_subtests( 2023-01-11T22:54:21.7408193Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7408361Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7408730Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7408886Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7409243Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7409366Z output = model(*input) 2023-01-11T22:54:21.7409692Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7409834Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7410212Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7410389Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7410757Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7410882Z _lazy_init(state, module) 2023-01-11T22:54:21.7411218Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7411390Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7411791Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7411942Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7412283Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7412410Z return func(*args, **kwargs) 2023-01-11T22:54:21.7412791Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7413034Z p_assert( 2023-01-11T22:54:21.7413385Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7413496Z traceback.print_stack() 2023-01-11T22:54:21.7413628Z File "", line 1, in 2023-01-11T22:54:21.7413841Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7413990Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7414199Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7414353Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7414568Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7414655Z self.run() 2023-01-11T22:54:21.7414857Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7415009Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7415357Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7415493Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7415858Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7415985Z getattr(self, test_name)() 2023-01-11T22:54:21.7416344Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7416513Z fn() 2023-01-11T22:54:21.7416891Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7417016Z test(self, **param_kwargs) 2023-01-11T22:54:21.7417374Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7417566Z return func(*args, **kwargs) 2023-01-11T22:54:21.7417871Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7417986Z self.run_subtests( 2023-01-11T22:54:21.7418346Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7418492Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7418858Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7419019Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7419398Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7419520Z output = model(*input) 2023-01-11T22:54:21.7419850Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7419989Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7420366Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7420526Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7420892Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7421019Z _lazy_init(state, module) 2023-01-11T22:54:21.7421375Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7421547Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7421945Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7422091Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7422435Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7422544Z return func(*args, **kwargs) 2023-01-11T22:54:21.7422920Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7423026Z p_assert( 2023-01-11T22:54:21.7423363Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7423495Z traceback.print_stack() 2023-01-11T22:54:21.7423623Z File "", line 1, in 2023-01-11T22:54:21.7423836Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7423980Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7424166Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7424319Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7424536Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7424640Z self.run() 2023-01-11T22:54:21.7424846Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7424996Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7425337Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7425453Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7425890Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7426013Z getattr(self, test_name)() 2023-01-11T22:54:21.7426375Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7426476Z fn() 2023-01-11T22:54:21.7426889Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7427020Z test(self, **param_kwargs) 2023-01-11T22:54:21.7427384Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7427492Z return func(*args, **kwargs) 2023-01-11T22:54:21.7427791Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7427913Z self.run_subtests( 2023-01-11T22:54:21.7428267Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7428431Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7428794Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7428950Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7429330Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7429435Z output = model(*input) 2023-01-11T22:54:21.7429764Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7429903Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7430282Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7430465Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7430838Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7430961Z _lazy_init(state, module) 2023-01-11T22:54:21.7431317Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7431471Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7431872Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7432017Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7432359Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7432485Z return func(*args, **kwargs) 2023-01-11T22:54:21.7432869Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7432974Z p_assert( 2023-01-11T22:54:21.7433310Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7433418Z traceback.print_stack() 2023-01-11T22:54:21.7433549Z File "", line 1, in 2023-01-11T22:54:21.7433764Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7433909Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7434113Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7434267Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7434481Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7434587Z self.run() 2023-01-11T22:54:21.7434773Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7434986Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7435334Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7435470Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7435830Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7436003Z getattr(self, test_name)() 2023-01-11T22:54:21.7436377Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7436477Z fn() 2023-01-11T22:54:21.7436825Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7436953Z test(self, **param_kwargs) 2023-01-11T22:54:21.7437310Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7437441Z return func(*args, **kwargs) 2023-01-11T22:54:21.7437747Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7437865Z self.run_subtests( 2023-01-11T22:54:21.7438220Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7438387Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7438736Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7438890Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7439266Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7439386Z output = model(*input) 2023-01-11T22:54:21.7439716Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7439855Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7440236Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7440411Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7440765Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7440887Z _lazy_init(state, module) 2023-01-11T22:54:21.7441242Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7441413Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7441811Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7441958Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7442296Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7442424Z return func(*args, **kwargs) 2023-01-11T22:54:21.7442785Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7442890Z p_assert( 2023-01-11T22:54:21.7443231Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7443359Z traceback.print_stack() 2023-01-11T22:54:21.7443491Z File "", line 1, in 2023-01-11T22:54:21.7443702Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7443847Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7444051Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7444248Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7444462Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7444570Z self.run() 2023-01-11T22:54:21.7444776Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7444924Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7445317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7445459Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7445814Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7445937Z getattr(self, test_name)() 2023-01-11T22:54:21.7446300Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7446400Z fn() 2023-01-11T22:54:21.7446775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7446900Z test(self, **param_kwargs) 2023-01-11T22:54:21.7447259Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7447386Z return func(*args, **kwargs) 2023-01-11T22:54:21.7447667Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7447783Z self.run_subtests( 2023-01-11T22:54:21.7448138Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7448303Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7448670Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7448828Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7449207Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7449330Z output = model(*input) 2023-01-11T22:54:21.7449639Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7449779Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7450161Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7450338Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7450708Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7450832Z _lazy_init(state, module) 2023-01-11T22:54:21.7451187Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7451361Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7451762Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7451887Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7452228Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7452357Z return func(*args, **kwargs) 2023-01-11T22:54:21.7452733Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7452842Z p_assert( 2023-01-11T22:54:21.7453338Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7453467Z traceback.print_stack() 2023-01-11T22:54:21.7454220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7455124Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7455886Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7456632Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7457376Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7458093Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7458828Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7459569Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7460302Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7461036Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7461771Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7462501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7463228Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7464112Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7464857Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7465587Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7466323Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7467055Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7467782Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7468520Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7469246Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7469977Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7470709Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7471444Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7472173Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7473012Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7473752Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7474485Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7475221Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7475950Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7476674Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7477410Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7478137Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7478863Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7479595Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7480327Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7481057Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7481886Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7482624Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7483354Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7484088Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7484814Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7485539Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7486271Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7487025Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7487753Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7488488Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7489215Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7489944Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7490782Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7491518Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7492248Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7493119Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7493856Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7494581Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7495317Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7496039Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7496765Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7497495Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7498221Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7498943Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7499817Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7500558Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7501283Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7502018Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7502743Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7503467Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7504198Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7504921Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7505646Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7506384Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7507112Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7507836Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7508667Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7509404Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7510131Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7510861Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7511589Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7512313Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7513045Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7513771Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7514496Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7515227Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7515952Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7516678Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7517508Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7518244Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7518968Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7519699Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7520422Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7521146Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7521881Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7522606Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7523334Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7524066Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7524790Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7525515Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7526314Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7527084Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7527820Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7528550Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7529274Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7529997Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7530726Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7531457Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7532182Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7533045Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7533768Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7534492Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7535308Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7536092Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7536836Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7536958Z dist init r=1, world=2 2023-01-11T22:54:21.7537293Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7537618Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7537930Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7538238Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7538543Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7538849Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7539156Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7539458Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7539761Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7540064Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7540352Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7540655Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7540774Z dist init r=0, world=2 2023-01-11T22:54:21.7541103Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7541420Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7541731Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7542096Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7542403Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7542751Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7543057Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7543361Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7543648Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7543951Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7544256Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7544558Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7544664Z ok (4.914s) 2023-01-11T22:54:21.7545038Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89279 2023-01-11T22:54:21.7545264Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89280 2023-01-11T22:54:21.7545648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.7545827Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.7546211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.7546392Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.7546765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.7546942Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.7547319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.7547513Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.7547767Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.7548014Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.7548418Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.7548816Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.7549032Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.7549262Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.7550286Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.7550462Z warnings.warn( 2023-01-11T22:54:21.7551525Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.7551640Z warnings.warn( 2023-01-11T22:54:21.7551774Z File "", line 1, in 2023-01-11T22:54:21.7551992Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7552142Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7552348Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7552481Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7552697Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7552803Z self.run() 2023-01-11T22:54:21.7553012Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7553163Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7553512Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7553648Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7554016Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7554130Z getattr(self, test_name)() 2023-01-11T22:54:21.7554493Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7554592Z fn() 2023-01-11T22:54:21.7554958Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7555082Z test(self, **param_kwargs) 2023-01-11T22:54:21.7555444Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7555574Z return func(*args, **kwargs) 2023-01-11T22:54:21.7555873Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7555971Z self.run_subtests( 2023-01-11T22:54:21.7556325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7556493Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7556864Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7557019Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7557396Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7557524Z output = model(*input) 2023-01-11T22:54:21.7557855Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7557977Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7558358Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7558536Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7558905Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7559090Z _lazy_init(state, module) 2023-01-11T22:54:21.7559451Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7559624Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7560075Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7560208Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7560554Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7560681Z return func(*args, **kwargs) 2023-01-11T22:54:21.7561063Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7561171Z p_assert( 2023-01-11T22:54:21.7561510Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7561640Z traceback.print_stack() 2023-01-11T22:54:21.7561774Z File "", line 1, in 2023-01-11T22:54:21.7561966Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7562113Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7562321Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7562477Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7562693Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7562800Z self.run() 2023-01-11T22:54:21.7563004Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7563134Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7563476Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7563613Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7563978Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7564104Z getattr(self, test_name)() 2023-01-11T22:54:21.7564464Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7564568Z fn() 2023-01-11T22:54:21.7564940Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7565047Z test(self, **param_kwargs) 2023-01-11T22:54:21.7565408Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7565535Z return func(*args, **kwargs) 2023-01-11T22:54:21.7565836Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7565956Z self.run_subtests( 2023-01-11T22:54:21.7566309Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7566476Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7566846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7566984Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7567361Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7567483Z output = model(*input) 2023-01-11T22:54:21.7567811Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7567950Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7568397Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7568574Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7568943Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7569047Z _lazy_init(state, module) 2023-01-11T22:54:21.7569452Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7569627Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7570031Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7570178Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7570516Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7570649Z return func(*args, **kwargs) 2023-01-11T22:54:21.7571027Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7571133Z p_assert( 2023-01-11T22:54:21.7571455Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7571583Z traceback.print_stack() 2023-01-11T22:54:21.7571717Z File "", line 1, in 2023-01-11T22:54:21.7571930Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7572075Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7572280Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7572433Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7572628Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7572740Z self.run() 2023-01-11T22:54:21.7573169Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7573328Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7573676Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7573812Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7574180Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7574306Z getattr(self, test_name)() 2023-01-11T22:54:21.7574652Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7574753Z fn() 2023-01-11T22:54:21.7575124Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7575252Z test(self, **param_kwargs) 2023-01-11T22:54:21.7575608Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7575736Z return func(*args, **kwargs) 2023-01-11T22:54:21.7576036Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7576150Z self.run_subtests( 2023-01-11T22:54:21.7576488Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7576650Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7577018Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7577174Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7577555Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7577763Z output = model(*input) 2023-01-11T22:54:21.7578099Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7578240Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7578600Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7578840Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7579224Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7579344Z _lazy_init(state, module) 2023-01-11T22:54:21.7579698Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7579871Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7580276Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7580422Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7580744Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7580872Z return func(*args, **kwargs) 2023-01-11T22:54:21.7581255Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7581360Z p_assert( 2023-01-11T22:54:21.7581697Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7581823Z traceback.print_stack() 2023-01-11T22:54:21.7581956Z File "", line 1, in 2023-01-11T22:54:21.7582166Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7582293Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7582501Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7582655Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7582871Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7582978Z self.run() 2023-01-11T22:54:21.7583184Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7583335Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7583663Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7583798Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7584160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7584286Z getattr(self, test_name)() 2023-01-11T22:54:21.7584647Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7584752Z fn() 2023-01-11T22:54:21.7585119Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7585246Z test(self, **param_kwargs) 2023-01-11T22:54:21.7585587Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7585717Z return func(*args, **kwargs) 2023-01-11T22:54:21.7586015Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7586130Z self.run_subtests( 2023-01-11T22:54:21.7586482Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7586647Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7587100Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7587257Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7587643Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7587766Z output = model(*input) 2023-01-11T22:54:21.7588143Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7588290Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7588675Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7588852Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7589220Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7589349Z _lazy_init(state, module) 2023-01-11T22:54:21.7589686Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7589856Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7590254Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7590403Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7590745Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7590874Z return func(*args, **kwargs) 2023-01-11T22:54:21.7591254Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7591360Z p_assert( 2023-01-11T22:54:21.7591677Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7591811Z traceback.print_stack() 2023-01-11T22:54:21.7591942Z File "", line 1, in 2023-01-11T22:54:21.7592153Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7592297Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7592503Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7592659Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7592875Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7592962Z self.run() 2023-01-11T22:54:21.7593166Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7593316Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7593659Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7593800Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7594164Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7594290Z getattr(self, test_name)() 2023-01-11T22:54:21.7594652Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7594734Z fn() 2023-01-11T22:54:21.7595104Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7595231Z test(self, **param_kwargs) 2023-01-11T22:54:21.7595591Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7595717Z return func(*args, **kwargs) 2023-01-11T22:54:21.7596017Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7596199Z self.run_subtests( 2023-01-11T22:54:21.7596561Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7596707Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7597073Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7597276Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7597665Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7597788Z output = model(*input) 2023-01-11T22:54:21.7598116Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7598258Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7598632Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7598796Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7599167Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7599290Z _lazy_init(state, module) 2023-01-11T22:54:21.7599647Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7599820Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7600223Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7600369Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7600710Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7600819Z return func(*args, **kwargs) 2023-01-11T22:54:21.7601203Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7601309Z p_assert( 2023-01-11T22:54:21.7601649Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7601778Z traceback.print_stack() 2023-01-11T22:54:21.7601912Z File "", line 1, in 2023-01-11T22:54:21.7602127Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7602273Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7602459Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7602612Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7602828Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7602936Z self.run() 2023-01-11T22:54:21.7603141Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7603292Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7603634Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7603750Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7604115Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7604245Z getattr(self, test_name)() 2023-01-11T22:54:21.7604609Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7604709Z fn() 2023-01-11T22:54:21.7605080Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7605204Z test(self, **param_kwargs) 2023-01-11T22:54:21.7605565Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7605774Z return func(*args, **kwargs) 2023-01-11T22:54:21.7606072Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7606189Z self.run_subtests( 2023-01-11T22:54:21.7606547Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7606759Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7607137Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7607293Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7607670Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7607771Z output = model(*input) 2023-01-11T22:54:21.7608103Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7608244Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7608621Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7608798Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7609167Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7609290Z _lazy_init(state, module) 2023-01-11T22:54:21.7609644Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7609795Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7610197Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7610347Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7610688Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7610815Z return func(*args, **kwargs) 2023-01-11T22:54:21.7611190Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7611295Z p_assert( 2023-01-11T22:54:21.7611634Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7611745Z traceback.print_stack() 2023-01-11T22:54:21.7612500Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7613389Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7613528Z File "", line 1, in 2023-01-11T22:54:21.7613748Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7613892Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7614097Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7614251Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7614466Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7614572Z self.run() 2023-01-11T22:54:21.7614758Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7614993Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7615344Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7615479Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7615844Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7616034Z getattr(self, test_name)() 2023-01-11T22:54:21.7616415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7616515Z fn() 2023-01-11T22:54:21.7616865Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7616991Z test(self, **param_kwargs) 2023-01-11T22:54:21.7617346Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7617478Z return func(*args, **kwargs) 2023-01-11T22:54:21.7617778Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7617894Z self.run_subtests( 2023-01-11T22:54:21.7618248Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7618413Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7618764Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7618916Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7619294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7619416Z output = model(*input) 2023-01-11T22:54:21.7619742Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7619886Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7620265Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7620442Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7620794Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7620917Z _lazy_init(state, module) 2023-01-11T22:54:21.7621272Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7621445Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7621844Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7621994Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7622333Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7622460Z return func(*args, **kwargs) 2023-01-11T22:54:21.7622819Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7622924Z p_assert( 2023-01-11T22:54:21.7623264Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7623394Z traceback.print_stack() 2023-01-11T22:54:21.7623527Z File "", line 1, in 2023-01-11T22:54:21.7623737Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7623882Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7624067Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7624288Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7624503Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7624609Z self.run() 2023-01-11T22:54:21.7624813Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7624962Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7625358Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7625499Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7625852Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7625979Z getattr(self, test_name)() 2023-01-11T22:54:21.7626341Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7626443Z fn() 2023-01-11T22:54:21.7626812Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7626936Z test(self, **param_kwargs) 2023-01-11T22:54:21.7627294Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7627422Z return func(*args, **kwargs) 2023-01-11T22:54:21.7627703Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7627818Z self.run_subtests( 2023-01-11T22:54:21.7628172Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7628338Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7628706Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7628866Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7629244Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7629365Z output = model(*input) 2023-01-11T22:54:21.7629674Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7629816Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7630196Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7630372Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7630740Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7630863Z _lazy_init(state, module) 2023-01-11T22:54:21.7631220Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7631392Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7631772Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7631920Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7632258Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7632387Z return func(*args, **kwargs) 2023-01-11T22:54:21.7632766Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7632869Z p_assert( 2023-01-11T22:54:21.7633206Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7633333Z traceback.print_stack() 2023-01-11T22:54:21.7633446Z File "", line 1, in 2023-01-11T22:54:21.7633723Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7633865Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7634071Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7634224Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7634441Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7634589Z self.run() 2023-01-11T22:54:21.7634798Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7634929Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7635276Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7635417Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7635779Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7635889Z getattr(self, test_name)() 2023-01-11T22:54:21.7636251Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7636351Z fn() 2023-01-11T22:54:21.7636718Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7636841Z test(self, **param_kwargs) 2023-01-11T22:54:21.7637203Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7637327Z return func(*args, **kwargs) 2023-01-11T22:54:21.7637625Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7637722Z self.run_subtests( 2023-01-11T22:54:21.7638073Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7638239Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7638603Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7638755Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7639132Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7639255Z output = model(*input) 2023-01-11T22:54:21.7639580Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7639702Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7640078Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7640252Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7640622Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7640741Z _lazy_init(state, module) 2023-01-11T22:54:21.7641094Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7641268Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7641667Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7641810Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7642133Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7642260Z return func(*args, **kwargs) 2023-01-11T22:54:21.7642639Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7642802Z p_assert( 2023-01-11T22:54:21.7643145Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7643271Z traceback.print_stack() 2023-01-11T22:54:21.7643401Z File "", line 1, in 2023-01-11T22:54:21.7643594Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7643738Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7643993Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7644153Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7644368Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7644474Z self.run() 2023-01-11T22:54:21.7644680Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7644827Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7645165Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7645300Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7645663Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7645784Z getattr(self, test_name)() 2023-01-11T22:54:21.7646148Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7646250Z fn() 2023-01-11T22:54:21.7646617Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7646741Z test(self, **param_kwargs) 2023-01-11T22:54:21.7647084Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7647209Z return func(*args, **kwargs) 2023-01-11T22:54:21.7647512Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7647625Z self.run_subtests( 2023-01-11T22:54:21.7647974Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7648137Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7648505Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7648661Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7649024Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7649145Z output = model(*input) 2023-01-11T22:54:21.7649471Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7649614Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7649991Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7650168Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7650533Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7650654Z _lazy_init(state, module) 2023-01-11T22:54:21.7650992Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7651160Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7651555Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7651699Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7652035Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7652235Z return func(*args, **kwargs) 2023-01-11T22:54:21.7652617Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7652723Z p_assert( 2023-01-11T22:54:21.7653178Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7653383Z traceback.print_stack() 2023-01-11T22:54:21.7653522Z File "", line 1, in 2023-01-11T22:54:21.7653734Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7653878Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7654081Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7654232Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7654427Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7654538Z self.run() 2023-01-11T22:54:21.7654743Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7654887Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7655237Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7655370Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7655737Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7655861Z getattr(self, test_name)() 2023-01-11T22:54:21.7656206Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7656306Z fn() 2023-01-11T22:54:21.7656671Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7656798Z test(self, **param_kwargs) 2023-01-11T22:54:21.7657156Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7657282Z return func(*args, **kwargs) 2023-01-11T22:54:21.7657578Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7657697Z self.run_subtests( 2023-01-11T22:54:21.7658036Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7658198Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7658562Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7658715Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7659094Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7659218Z output = model(*input) 2023-01-11T22:54:21.7659546Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7659683Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7660045Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7660224Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7660592Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7660713Z _lazy_init(state, module) 2023-01-11T22:54:21.7661065Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7661233Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7661721Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7661867Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7662204Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7662312Z return func(*args, **kwargs) 2023-01-11T22:54:21.7662736Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7662845Z p_assert( 2023-01-11T22:54:21.7663188Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7663316Z traceback.print_stack() 2023-01-11T22:54:21.7663446Z File "", line 1, in 2023-01-11T22:54:21.7663655Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7663787Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7663990Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7664143Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7664358Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7664460Z self.run() 2023-01-11T22:54:21.7664667Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7664815Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7665156Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7665273Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7665638Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7665763Z getattr(self, test_name)() 2023-01-11T22:54:21.7666130Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7666228Z fn() 2023-01-11T22:54:21.7666595Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7666720Z test(self, **param_kwargs) 2023-01-11T22:54:21.7667079Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7667188Z return func(*args, **kwargs) 2023-01-11T22:54:21.7667484Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7667602Z self.run_subtests( 2023-01-11T22:54:21.7667956Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7668120Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7668494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7668646Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7669019Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7669123Z output = model(*input) 2023-01-11T22:54:21.7669452Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7669594Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7669974Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7670151Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7670517Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7670699Z _lazy_init(state, module) 2023-01-11T22:54:21.7671055Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7671207Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7671604Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7671798Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7672147Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7672274Z return func(*args, **kwargs) 2023-01-11T22:54:21.7672649Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7672751Z p_assert( 2023-01-11T22:54:21.7673090Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7673205Z traceback.print_stack() 2023-01-11T22:54:21.7673336Z File "", line 1, in 2023-01-11T22:54:21.7673548Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7673693Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7673894Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7674049Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7674264Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7674352Z self.run() 2023-01-11T22:54:21.7674558Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7674707Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7675047Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7675184Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7675543Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7675667Z getattr(self, test_name)() 2023-01-11T22:54:21.7676031Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7676113Z fn() 2023-01-11T22:54:21.7676485Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7676611Z test(self, **param_kwargs) 2023-01-11T22:54:21.7676970Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7677094Z return func(*args, **kwargs) 2023-01-11T22:54:21.7677391Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7677511Z self.run_subtests( 2023-01-11T22:54:21.7677867Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7678013Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7678376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7678533Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7678906Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7679028Z output = model(*input) 2023-01-11T22:54:21.7679348Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7679485Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7679864Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7680087Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7680460Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7680581Z _lazy_init(state, module) 2023-01-11T22:54:21.7680986Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7681161Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7681563Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7681707Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7682043Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7682156Z return func(*args, **kwargs) 2023-01-11T22:54:21.7682536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7682640Z p_assert( 2023-01-11T22:54:21.7682980Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7683107Z traceback.print_stack() 2023-01-11T22:54:21.7683237Z File "", line 1, in 2023-01-11T22:54:21.7683448Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7683592Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7683777Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7683929Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7684142Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7684248Z self.run() 2023-01-11T22:54:21.7684455Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7684602Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7684940Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7685073Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7685420Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7685543Z getattr(self, test_name)() 2023-01-11T22:54:21.7685902Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7686001Z fn() 2023-01-11T22:54:21.7686368Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7686490Z test(self, **param_kwargs) 2023-01-11T22:54:21.7686845Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7686974Z return func(*args, **kwargs) 2023-01-11T22:54:21.7687255Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7687367Z self.run_subtests( 2023-01-11T22:54:21.7687725Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7687912Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7688281Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7688433Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7688809Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7688991Z output = model(*input) 2023-01-11T22:54:21.7689307Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7689443Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7689815Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7689991Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7690407Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7690534Z _lazy_init(state, module) 2023-01-11T22:54:21.7690894Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7691065Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7691444Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7691596Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7691934Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7692063Z return func(*args, **kwargs) 2023-01-11T22:54:21.7692437Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7692542Z p_assert( 2023-01-11T22:54:21.7693100Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7693238Z traceback.print_stack() 2023-01-11T22:54:21.7693353Z File "", line 1, in 2023-01-11T22:54:21.7693565Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7693705Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7693909Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7694065Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7694278Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7694383Z self.run() 2023-01-11T22:54:21.7694569Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7694716Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7695065Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7695200Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7695562Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7695687Z getattr(self, test_name)() 2023-01-11T22:54:21.7696047Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7696151Z fn() 2023-01-11T22:54:21.7696497Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7696620Z test(self, **param_kwargs) 2023-01-11T22:54:21.7696976Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7697098Z return func(*args, **kwargs) 2023-01-11T22:54:21.7697399Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7697512Z self.run_subtests( 2023-01-11T22:54:21.7697863Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7698021Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7698366Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7698608Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7698989Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7699110Z output = model(*input) 2023-01-11T22:54:21.7699439Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7699639Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7700033Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7700207Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7700556Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7700676Z _lazy_init(state, module) 2023-01-11T22:54:21.7701033Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7701204Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7701599Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7701743Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7702085Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7702211Z return func(*args, **kwargs) 2023-01-11T22:54:21.7702571Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7702676Z p_assert( 2023-01-11T22:54:21.7703010Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7703135Z traceback.print_stack() 2023-01-11T22:54:21.7703269Z File "", line 1, in 2023-01-11T22:54:21.7703477Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7703621Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7703826Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7703960Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7704178Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7704283Z self.run() 2023-01-11T22:54:21.7704487Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7704633Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7704974Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7705108Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7705473Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7705581Z getattr(self, test_name)() 2023-01-11T22:54:21.7705942Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7706040Z fn() 2023-01-11T22:54:21.7706403Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7706528Z test(self, **param_kwargs) 2023-01-11T22:54:21.7706885Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7707013Z return func(*args, **kwargs) 2023-01-11T22:54:21.7707307Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7707404Z self.run_subtests( 2023-01-11T22:54:21.7707829Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7707993Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7708352Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7708506Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7708924Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7709049Z output = model(*input) 2023-01-11T22:54:21.7709381Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7709504Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7709878Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7710056Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7710424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7710542Z _lazy_init(state, module) 2023-01-11T22:54:21.7710894Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7711065Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7711461Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7711588Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7711924Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7712053Z return func(*args, **kwargs) 2023-01-11T22:54:21.7712425Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7712529Z p_assert( 2023-01-11T22:54:21.7712861Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7712987Z traceback.print_stack() 2023-01-11T22:54:21.7713118Z File "", line 1, in 2023-01-11T22:54:21.7713310Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7713455Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7713660Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7713806Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7714016Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7714120Z self.run() 2023-01-11T22:54:21.7714325Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7714457Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7714800Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7714931Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7715292Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7715416Z getattr(self, test_name)() 2023-01-11T22:54:21.7715780Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7715881Z fn() 2023-01-11T22:54:21.7716249Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7716355Z test(self, **param_kwargs) 2023-01-11T22:54:21.7716711Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7716915Z return func(*args, **kwargs) 2023-01-11T22:54:21.7717214Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7717327Z self.run_subtests( 2023-01-11T22:54:21.7717685Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7717898Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7718274Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7718410Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7718785Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7718906Z output = model(*input) 2023-01-11T22:54:21.7719230Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7719374Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7719753Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7719931Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7720298Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7720403Z _lazy_init(state, module) 2023-01-11T22:54:21.7720758Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7720930Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7721328Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7721473Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7721811Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7721934Z return func(*args, **kwargs) 2023-01-11T22:54:21.7722313Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7722400Z p_assert( 2023-01-11T22:54:21.7722740Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7722869Z traceback.print_stack() 2023-01-11T22:54:21.7722999Z File "", line 1, in 2023-01-11T22:54:21.7723209Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7723355Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7723560Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7723712Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7723913Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7724016Z self.run() 2023-01-11T22:54:21.7724222Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7724370Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7724714Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7724847Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7725211Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7725335Z getattr(self, test_name)() 2023-01-11T22:54:21.7725678Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7725776Z fn() 2023-01-11T22:54:21.7726137Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7726322Z test(self, **param_kwargs) 2023-01-11T22:54:21.7726683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7726810Z return func(*args, **kwargs) 2023-01-11T22:54:21.7727154Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7727271Z self.run_subtests( 2023-01-11T22:54:21.7727616Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7727779Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7728144Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7728298Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7728675Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7728795Z output = model(*input) 2023-01-11T22:54:21.7729122Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7729259Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7729621Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7729798Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7730164Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7730287Z _lazy_init(state, module) 2023-01-11T22:54:21.7730640Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7730815Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7731212Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7731356Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7731676Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7731806Z return func(*args, **kwargs) 2023-01-11T22:54:21.7732182Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7732282Z p_assert( 2023-01-11T22:54:21.7732621Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7732749Z traceback.print_stack() 2023-01-11T22:54:21.7733649Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7734404Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7734534Z File "", line 1, in 2023-01-11T22:54:21.7734728Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7734869Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7735073Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7735224Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7735526Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7735632Z self.run() 2023-01-11T22:54:21.7735836Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7735982Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7736314Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7736505Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7736885Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7737009Z getattr(self, test_name)() 2023-01-11T22:54:21.7737372Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7737472Z fn() 2023-01-11T22:54:21.7737841Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7737970Z test(self, **param_kwargs) 2023-01-11T22:54:21.7738310Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7738437Z return func(*args, **kwargs) 2023-01-11T22:54:21.7738734Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7738848Z self.run_subtests( 2023-01-11T22:54:21.7739202Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7739369Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7739734Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7739888Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7740253Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7740374Z output = model(*input) 2023-01-11T22:54:21.7740701Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7740839Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7741220Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7741393Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7741758Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7741881Z _lazy_init(state, module) 2023-01-11T22:54:21.7742217Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7742395Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7742796Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7742940Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7743279Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7743405Z return func(*args, **kwargs) 2023-01-11T22:54:21.7743783Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7743886Z p_assert( 2023-01-11T22:54:21.7744205Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7744332Z traceback.print_stack() 2023-01-11T22:54:21.7744459Z File "", line 1, in 2023-01-11T22:54:21.7744667Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7744869Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7745071Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7745225Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7745436Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7745524Z self.run() 2023-01-11T22:54:21.7745800Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7745950Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7746298Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7746432Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7746795Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7746919Z getattr(self, test_name)() 2023-01-11T22:54:21.7747264Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7747362Z fn() 2023-01-11T22:54:21.7747723Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7747850Z test(self, **param_kwargs) 2023-01-11T22:54:21.7748209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7748334Z return func(*args, **kwargs) 2023-01-11T22:54:21.7748634Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7748746Z self.run_subtests( 2023-01-11T22:54:21.7749082Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7749248Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7749612Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7749763Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7750136Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7750257Z output = model(*input) 2023-01-11T22:54:21.7750585Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7750724Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7751101Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7751260Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7751622Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7751747Z _lazy_init(state, module) 2023-01-11T22:54:21.7752101Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7752269Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7752668Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7752813Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7753143Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7753253Z return func(*args, **kwargs) 2023-01-11T22:54:21.7753665Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7753768Z p_assert( 2023-01-11T22:54:21.7754173Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7754301Z traceback.print_stack() 2023-01-11T22:54:21.7754434Z File "", line 1, in 2023-01-11T22:54:21.7754647Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7754773Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7755023Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7755177Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7755390Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7755492Z self.run() 2023-01-11T22:54:21.7755693Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7755835Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7756179Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7756301Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7756662Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7756783Z getattr(self, test_name)() 2023-01-11T22:54:21.7757140Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7757242Z fn() 2023-01-11T22:54:21.7757607Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7757730Z test(self, **param_kwargs) 2023-01-11T22:54:21.7758082Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7758190Z return func(*args, **kwargs) 2023-01-11T22:54:21.7758485Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7758602Z self.run_subtests( 2023-01-11T22:54:21.7758954Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7759113Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7759481Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7759636Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7760011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7760115Z output = model(*input) 2023-01-11T22:54:21.7760442Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7760580Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7760965Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7761140Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7761506Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7761628Z _lazy_init(state, module) 2023-01-11T22:54:21.7761987Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7762139Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7762536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7762682Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7763020Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7763208Z return func(*args, **kwargs) 2023-01-11T22:54:21.7763591Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7763691Z p_assert( 2023-01-11T22:54:21.7764021Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7764131Z traceback.print_stack() 2023-01-11T22:54:21.7764311Z File "", line 1, in 2023-01-11T22:54:21.7764525Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7764671Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7764875Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7765027Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7765242Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7765347Z self.run() 2023-01-11T22:54:21.7765533Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7765677Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7766022Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7766155Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7766519Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7766642Z getattr(self, test_name)() 2023-01-11T22:54:21.7767001Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7767082Z fn() 2023-01-11T22:54:21.7767443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7767564Z test(self, **param_kwargs) 2023-01-11T22:54:21.7767926Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7768050Z return func(*args, **kwargs) 2023-01-11T22:54:21.7768346Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7768461Z self.run_subtests( 2023-01-11T22:54:21.7768815Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7768962Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7769325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7769477Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7769851Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7769970Z output = model(*input) 2023-01-11T22:54:21.7770297Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7770435Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7770806Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7770984Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7771333Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7771452Z _lazy_init(state, module) 2023-01-11T22:54:21.7771804Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7771970Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7772366Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7772576Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7773102Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7773231Z return func(*args, **kwargs) 2023-01-11T22:54:21.7773668Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7773782Z p_assert( 2023-01-11T22:54:21.7774123Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7774249Z traceback.print_stack() 2023-01-11T22:54:21.7774378Z File "", line 1, in 2023-01-11T22:54:21.7774586Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7774729Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7774918Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7775071Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7775283Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7775384Z self.run() 2023-01-11T22:54:21.7775584Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7775730Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7776069Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7776200Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7776543Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7776665Z getattr(self, test_name)() 2023-01-11T22:54:21.7777021Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7777121Z fn() 2023-01-11T22:54:21.7777488Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7777613Z test(self, **param_kwargs) 2023-01-11T22:54:21.7777971Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7778099Z return func(*args, **kwargs) 2023-01-11T22:54:21.7778379Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7778495Z self.run_subtests( 2023-01-11T22:54:21.7778850Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7779013Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7779381Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7779535Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7779910Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7780031Z output = model(*input) 2023-01-11T22:54:21.7780343Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7780481Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7780858Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7781032Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7781396Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7781516Z _lazy_init(state, module) 2023-01-11T22:54:21.7781971Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7782140Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7782518Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7782661Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7783050Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7783182Z return func(*args, **kwargs) 2023-01-11T22:54:21.7783564Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7783668Z p_assert( 2023-01-11T22:54:21.7784005Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7784132Z traceback.print_stack() 2023-01-11T22:54:21.7784245Z File "", line 1, in 2023-01-11T22:54:21.7784452Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7784593Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7784793Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7784948Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7785168Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7785273Z self.run() 2023-01-11T22:54:21.7785477Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7785607Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7785946Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7786076Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7786443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7786561Z getattr(self, test_name)() 2023-01-11T22:54:21.7786923Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7787019Z fn() 2023-01-11T22:54:21.7787371Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7787497Z test(self, **param_kwargs) 2023-01-11T22:54:21.7787856Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7787984Z return func(*args, **kwargs) 2023-01-11T22:54:21.7788282Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7788423Z self.run_subtests( 2023-01-11T22:54:21.7788777Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7788941Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7789288Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7789438Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7789816Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7789937Z output = model(*input) 2023-01-11T22:54:21.7790265Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7790404Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7790777Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7791013Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7791385Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7791490Z _lazy_init(state, module) 2023-01-11T22:54:21.7791840Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7792054Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7792462Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7792606Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7792941Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7793064Z return func(*args, **kwargs) 2023-01-11T22:54:21.7793437Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7793528Z p_assert( 2023-01-11T22:54:21.7793868Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7793996Z traceback.print_stack() 2023-01-11T22:54:21.7794748Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7795490Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7796239Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7796981Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7797717Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7798452Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7799183Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7799910Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7801003Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:237: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.7801185Z (rank, world_num_valid_indices[rank]) 2023-01-11T22:54:21.7802231Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.7802343Z world_indices[ 2023-01-11T22:54:21.7803358Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.7803467Z world_indices[ 2023-01-11T22:54:21.7803578Z dist init r=0, world=2 2023-01-11T22:54:21.7803910Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7804212Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7804526Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7804830Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7805136Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7805436Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7805734Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7806036Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7806336Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7806638Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7806939Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7807234Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.7807344Z dist init r=1, world=2 2023-01-11T22:54:21.7807649Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7808021Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7808329Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7808672Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7808978Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7809279Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7809583Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7809883Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7810185Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7810483Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7810782Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7811069Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.7811174Z ok (5.115s) 2023-01-11T22:54:21.7811558Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89362 2023-01-11T22:54:21.7811779Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89363 2023-01-11T22:54:21.7812167Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.7812339Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.7812721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.7813047Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.7813431Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.7813590Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.7813967Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.7814161Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.7814406Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.7814651Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.7815049Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.7815444Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.7815761Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.7815987Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.7817077Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.7817183Z warnings.warn( 2023-01-11T22:54:21.7818211Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.7818329Z warnings.warn( 2023-01-11T22:54:21.7818460Z File "", line 1, in 2023-01-11T22:54:21.7818679Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7818823Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7819027Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7819178Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7819395Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7819484Z self.run() 2023-01-11T22:54:21.7819689Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7819841Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7820186Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7820317Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7820675Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7820800Z getattr(self, test_name)() 2023-01-11T22:54:21.7821163Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7821246Z fn() 2023-01-11T22:54:21.7821610Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7821730Z test(self, **param_kwargs) 2023-01-11T22:54:21.7822087Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7822217Z return func(*args, **kwargs) 2023-01-11T22:54:21.7822513Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7822626Z self.run_subtests( 2023-01-11T22:54:21.7822976Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7823126Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7823494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7823648Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7824025Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7824145Z output = model(*input) 2023-01-11T22:54:21.7824469Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7824671Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7825058Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7825217Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7825634Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7825760Z _lazy_init(state, module) 2023-01-11T22:54:21.7826122Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7826288Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7826683Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7826825Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7827165Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7827277Z return func(*args, **kwargs) 2023-01-11T22:54:21.7827653Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7827756Z p_assert( 2023-01-11T22:54:21.7828093Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7828219Z traceback.print_stack() 2023-01-11T22:54:21.7828346Z File "", line 1, in 2023-01-11T22:54:21.7828555Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7828697Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7828883Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7829038Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7829249Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7829354Z self.run() 2023-01-11T22:54:21.7829557Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7829703Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7830048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7830166Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7830525Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7830649Z getattr(self, test_name)() 2023-01-11T22:54:21.7831011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7831104Z fn() 2023-01-11T22:54:21.7831473Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7831598Z test(self, **param_kwargs) 2023-01-11T22:54:21.7831957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7832067Z return func(*args, **kwargs) 2023-01-11T22:54:21.7832363Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7832477Z self.run_subtests( 2023-01-11T22:54:21.7832831Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7832991Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7833355Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7833570Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7833949Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7834069Z output = model(*input) 2023-01-11T22:54:21.7834377Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7834516Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7834945Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7835127Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7835498Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7835615Z _lazy_init(state, module) 2023-01-11T22:54:21.7835968Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7836136Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7836520Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7836662Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7836997Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7837122Z return func(*args, **kwargs) 2023-01-11T22:54:21.7837498Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7837600Z p_assert( 2023-01-11T22:54:21.7837934Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7838060Z traceback.print_stack() 2023-01-11T22:54:21.7838174Z File "", line 1, in 2023-01-11T22:54:21.7838388Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7838529Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7838730Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7838880Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7839091Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7839200Z self.run() 2023-01-11T22:54:21.7839387Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7839532Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7839872Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7840002Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7840361Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7840489Z getattr(self, test_name)() 2023-01-11T22:54:21.7840849Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7840950Z fn() 2023-01-11T22:54:21.7841300Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7841424Z test(self, **param_kwargs) 2023-01-11T22:54:21.7841783Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7841909Z return func(*args, **kwargs) 2023-01-11T22:54:21.7842206Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7842317Z self.run_subtests( 2023-01-11T22:54:21.7842669Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7842908Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7843262Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7843413Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7843839Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7843965Z output = model(*input) 2023-01-11T22:54:21.7844295Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7844433Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7844809Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7844985Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7845338Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7845461Z _lazy_init(state, module) 2023-01-11T22:54:21.7845811Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7845979Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7846379Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7846523Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7846867Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7846993Z return func(*args, **kwargs) 2023-01-11T22:54:21.7847352Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7847459Z p_assert( 2023-01-11T22:54:21.7847792Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7847919Z traceback.print_stack() 2023-01-11T22:54:21.7848048Z File "", line 1, in 2023-01-11T22:54:21.7848255Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7848395Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7848600Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7848737Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7848949Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7849054Z self.run() 2023-01-11T22:54:21.7849255Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7849400Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7849746Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7849879Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7850227Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7850349Z getattr(self, test_name)() 2023-01-11T22:54:21.7850709Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7850807Z fn() 2023-01-11T22:54:21.7851169Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7851291Z test(self, **param_kwargs) 2023-01-11T22:54:21.7851649Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7851774Z return func(*args, **kwargs) 2023-01-11T22:54:21.7852116Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7852227Z self.run_subtests( 2023-01-11T22:54:21.7852586Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7852745Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7853328Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7853495Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7853881Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7854002Z output = model(*input) 2023-01-11T22:54:21.7854325Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7854454Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7854833Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7855006Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7855374Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7855500Z _lazy_init(state, module) 2023-01-11T22:54:21.7855852Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7856016Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7856408Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7856535Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7856869Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7856996Z return func(*args, **kwargs) 2023-01-11T22:54:21.7857368Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7857469Z p_assert( 2023-01-11T22:54:21.7857803Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7857931Z traceback.print_stack() 2023-01-11T22:54:21.7858060Z File "", line 1, in 2023-01-11T22:54:21.7858253Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7858395Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7858594Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7858744Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7858957Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7859066Z self.run() 2023-01-11T22:54:21.7859269Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7859399Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7859736Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7859867Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7860233Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7860357Z getattr(self, test_name)() 2023-01-11T22:54:21.7860714Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7860811Z fn() 2023-01-11T22:54:21.7861175Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7861367Z test(self, **param_kwargs) 2023-01-11T22:54:21.7861729Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7861852Z return func(*args, **kwargs) 2023-01-11T22:54:21.7862146Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7862307Z self.run_subtests( 2023-01-11T22:54:21.7862675Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7862837Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7863204Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7863342Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7863716Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7863844Z output = model(*input) 2023-01-11T22:54:21.7864169Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7864302Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7864675Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7864850Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7865213Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7865317Z _lazy_init(state, module) 2023-01-11T22:54:21.7865669Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7865839Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7866237Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7866385Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7866717Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7866840Z return func(*args, **kwargs) 2023-01-11T22:54:21.7867222Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7867309Z p_assert( 2023-01-11T22:54:21.7867647Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7867772Z traceback.print_stack() 2023-01-11T22:54:21.7867904Z File "", line 1, in 2023-01-11T22:54:21.7868116Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7868261Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7868460Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7868609Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7868804Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7868906Z self.run() 2023-01-11T22:54:21.7869111Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7869256Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7869592Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7869723Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7870081Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7870189Z getattr(self, test_name)() 2023-01-11T22:54:21.7870620Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7870719Z fn() 2023-01-11T22:54:21.7871086Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7871208Z test(self, **param_kwargs) 2023-01-11T22:54:21.7871612Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7871743Z return func(*args, **kwargs) 2023-01-11T22:54:21.7872041Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7872138Z self.run_subtests( 2023-01-11T22:54:21.7872494Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7872654Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7873024Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7873173Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7873545Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7873657Z output = model(*input) 2023-01-11T22:54:21.7873985Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7874122Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7874484Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7874657Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7875024Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7875149Z _lazy_init(state, module) 2023-01-11T22:54:21.7875499Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7875669Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7876057Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7876202Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7876524Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7876649Z return func(*args, **kwargs) 2023-01-11T22:54:21.7877025Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7877126Z p_assert( 2023-01-11T22:54:21.7877464Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7877593Z traceback.print_stack() 2023-01-11T22:54:21.7878344Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7879091Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.7879220Z File "", line 1, in 2023-01-11T22:54:21.7879416Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7879559Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7879828Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7879984Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7880200Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7880304Z self.run() 2023-01-11T22:54:21.7880508Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7880697Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7881038Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7881172Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7881535Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7881661Z getattr(self, test_name)() 2023-01-11T22:54:21.7882019Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7882119Z fn() 2023-01-11T22:54:21.7882485Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7882608Z test(self, **param_kwargs) 2023-01-11T22:54:21.7882948Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7883078Z return func(*args, **kwargs) 2023-01-11T22:54:21.7883375Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7883488Z self.run_subtests( 2023-01-11T22:54:21.7883840Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7884001Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7884370Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7884524Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7884885Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7885006Z output = model(*input) 2023-01-11T22:54:21.7885330Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7885469Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7885849Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7886021Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7886389Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7886513Z _lazy_init(state, module) 2023-01-11T22:54:21.7886850Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7887017Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7887413Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7887559Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7887898Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7888024Z return func(*args, **kwargs) 2023-01-11T22:54:21.7888400Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7888502Z p_assert( 2023-01-11T22:54:21.7888822Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7889080Z traceback.print_stack() 2023-01-11T22:54:21.7889209Z File "", line 1, in 2023-01-11T22:54:21.7889422Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7889565Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7889766Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7889960Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7890177Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7890266Z self.run() 2023-01-11T22:54:21.7890469Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7890613Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7890965Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7891101Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7891460Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7891582Z getattr(self, test_name)() 2023-01-11T22:54:21.7891924Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7892025Z fn() 2023-01-11T22:54:21.7892388Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7892513Z test(self, **param_kwargs) 2023-01-11T22:54:21.7893050Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7893186Z return func(*args, **kwargs) 2023-01-11T22:54:21.7893481Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7893602Z self.run_subtests( 2023-01-11T22:54:21.7893943Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7894108Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7894473Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7894628Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7895004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7895122Z output = model(*input) 2023-01-11T22:54:21.7895442Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7895581Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7895939Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7896118Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7896487Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7896605Z _lazy_init(state, module) 2023-01-11T22:54:21.7896955Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7897125Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7897522Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7897665Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7898000Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7898111Z return func(*args, **kwargs) 2023-01-11T22:54:21.7898587Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7898687Z p_assert( 2023-01-11T22:54:21.7899023Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7899148Z traceback.print_stack() 2023-01-11T22:54:21.7899276Z File "", line 1, in 2023-01-11T22:54:21.7899544Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7899681Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7899880Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7900029Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7900241Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7900342Z self.run() 2023-01-11T22:54:21.7900546Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7900697Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7901041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7901158Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7901520Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7901649Z getattr(self, test_name)() 2023-01-11T22:54:21.7902004Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7902101Z fn() 2023-01-11T22:54:21.7902469Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7902587Z test(self, **param_kwargs) 2023-01-11T22:54:21.7910443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7910631Z return func(*args, **kwargs) 2023-01-11T22:54:21.7910947Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7911065Z self.run_subtests( 2023-01-11T22:54:21.7911445Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7911617Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7911993Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7912150Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7912533Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7912657Z output = model(*input) 2023-01-11T22:54:21.7912991Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7913132Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7913498Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7913675Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7914048Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7914173Z _lazy_init(state, module) 2023-01-11T22:54:21.7914532Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7914701Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7915105Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7915381Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7915732Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7915841Z return func(*args, **kwargs) 2023-01-11T22:54:21.7916215Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7916316Z p_assert( 2023-01-11T22:54:21.7916713Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7916848Z traceback.print_stack() 2023-01-11T22:54:21.7916976Z File "", line 1, in 2023-01-11T22:54:21.7917236Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7917364Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7917562Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7917718Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7917927Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7918030Z self.run() 2023-01-11T22:54:21.7918231Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7918376Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7918730Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7918849Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7919215Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7919337Z getattr(self, test_name)() 2023-01-11T22:54:21.7919697Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7919799Z fn() 2023-01-11T22:54:21.7920162Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7920288Z test(self, **param_kwargs) 2023-01-11T22:54:21.7920647Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7920755Z return func(*args, **kwargs) 2023-01-11T22:54:21.7921057Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7921169Z self.run_subtests( 2023-01-11T22:54:21.7921526Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7921688Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7922047Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7922205Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7922580Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7922683Z output = model(*input) 2023-01-11T22:54:21.7923012Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7923155Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7923534Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7923711Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7924073Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7924192Z _lazy_init(state, module) 2023-01-11T22:54:21.7924546Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7924763Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7925168Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7925310Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7925692Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7925827Z return func(*args, **kwargs) 2023-01-11T22:54:21.7926209Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7926313Z p_assert( 2023-01-11T22:54:21.7926650Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7926761Z traceback.print_stack() 2023-01-11T22:54:21.7926895Z File "", line 1, in 2023-01-11T22:54:21.7927103Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7927246Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7927452Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7927604Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7927821Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7927910Z self.run() 2023-01-11T22:54:21.7928113Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7928257Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7928599Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7928733Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7929091Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7929221Z getattr(self, test_name)() 2023-01-11T22:54:21.7929580Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7929663Z fn() 2023-01-11T22:54:21.7930026Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7930154Z test(self, **param_kwargs) 2023-01-11T22:54:21.7930514Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7930641Z return func(*args, **kwargs) 2023-01-11T22:54:21.7930939Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7931055Z self.run_subtests( 2023-01-11T22:54:21.7931409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7931558Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7931924Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7932080Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7932462Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7932583Z output = model(*input) 2023-01-11T22:54:21.7933146Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7933299Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7933690Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7933851Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7934337Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7934460Z _lazy_init(state, module) 2023-01-11T22:54:21.7934817Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7934988Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7935446Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7935597Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7935943Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7936067Z return func(*args, **kwargs) 2023-01-11T22:54:21.7936428Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7936536Z p_assert( 2023-01-11T22:54:21.7936871Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7936995Z traceback.print_stack() 2023-01-11T22:54:21.7937125Z File "", line 1, in 2023-01-11T22:54:21.7937331Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7937479Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7937667Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7937814Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7938026Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7938129Z self.run() 2023-01-11T22:54:21.7938332Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7938478Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7938824Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7938956Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7939303Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7939427Z getattr(self, test_name)() 2023-01-11T22:54:21.7939790Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7939889Z fn() 2023-01-11T22:54:21.7940257Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7940382Z test(self, **param_kwargs) 2023-01-11T22:54:21.7940738Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7940861Z return func(*args, **kwargs) 2023-01-11T22:54:21.7941145Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7941260Z self.run_subtests( 2023-01-11T22:54:21.7941616Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7941777Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7942144Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7942300Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7942678Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7942797Z output = model(*input) 2023-01-11T22:54:21.7943108Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7943309Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7943688Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7943860Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7944228Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7944397Z _lazy_init(state, module) 2023-01-11T22:54:21.7944760Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7944928Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7945308Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7945452Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7945798Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7945923Z return func(*args, **kwargs) 2023-01-11T22:54:21.7946302Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7946406Z p_assert( 2023-01-11T22:54:21.7946748Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7946876Z traceback.print_stack() 2023-01-11T22:54:21.7946989Z File "", line 1, in 2023-01-11T22:54:21.7947199Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7947343Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7947545Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7947702Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7947919Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7948024Z self.run() 2023-01-11T22:54:21.7948210Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7948362Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7948703Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7948841Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7949205Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7949330Z getattr(self, test_name)() 2023-01-11T22:54:21.7949692Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7949792Z fn() 2023-01-11T22:54:21.7950143Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7950273Z test(self, **param_kwargs) 2023-01-11T22:54:21.7950633Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7950759Z return func(*args, **kwargs) 2023-01-11T22:54:21.7951059Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7951177Z self.run_subtests( 2023-01-11T22:54:21.7951531Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7951693Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7952040Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7952195Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7952641Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7952764Z output = model(*input) 2023-01-11T22:54:21.7953091Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7953228Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7953650Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7953832Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7954188Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7954307Z _lazy_init(state, module) 2023-01-11T22:54:21.7954663Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7954835Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7955241Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7955386Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7955724Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7955852Z return func(*args, **kwargs) 2023-01-11T22:54:21.7956232Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7956321Z p_assert( 2023-01-11T22:54:21.7956661Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7956790Z traceback.print_stack() 2023-01-11T22:54:21.7956920Z File "", line 1, in 2023-01-11T22:54:21.7957131Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7957280Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7957486Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7957620Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7957832Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7957933Z self.run() 2023-01-11T22:54:21.7958140Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7958288Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7958628Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7958761Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7959120Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7959228Z getattr(self, test_name)() 2023-01-11T22:54:21.7959595Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7959694Z fn() 2023-01-11T22:54:21.7960061Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7960186Z test(self, **param_kwargs) 2023-01-11T22:54:21.7960548Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7960674Z return func(*args, **kwargs) 2023-01-11T22:54:21.7960969Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7961066Z self.run_subtests( 2023-01-11T22:54:21.7961421Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7961646Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7962017Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7962169Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7962539Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7962660Z output = model(*input) 2023-01-11T22:54:21.7963037Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7963167Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7963549Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7963726Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7964095Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7964222Z _lazy_init(state, module) 2023-01-11T22:54:21.7964576Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7964743Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7965144Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7965272Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7965612Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7965736Z return func(*args, **kwargs) 2023-01-11T22:54:21.7966113Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7966218Z p_assert( 2023-01-11T22:54:21.7966562Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7966685Z traceback.print_stack() 2023-01-11T22:54:21.7966813Z File "", line 1, in 2023-01-11T22:54:21.7967004Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7967587Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7967799Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7967953Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7968162Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7968264Z self.run() 2023-01-11T22:54:21.7968463Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7968594Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7968932Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7969070Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7969430Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7969553Z getattr(self, test_name)() 2023-01-11T22:54:21.7969917Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7970022Z fn() 2023-01-11T22:54:21.7970386Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7970492Z test(self, **param_kwargs) 2023-01-11T22:54:21.7970852Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7970978Z return func(*args, **kwargs) 2023-01-11T22:54:21.7971275Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7971455Z self.run_subtests( 2023-01-11T22:54:21.7971817Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7971980Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7972389Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7972533Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7972917Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7973040Z output = model(*input) 2023-01-11T22:54:21.7973523Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7973664Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7974053Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7974229Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7974598Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7974702Z _lazy_init(state, module) 2023-01-11T22:54:21.7975061Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7975230Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7975630Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7975776Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7976112Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7976242Z return func(*args, **kwargs) 2023-01-11T22:54:21.7976622Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7976726Z p_assert( 2023-01-11T22:54:21.7977045Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7977174Z traceback.print_stack() 2023-01-11T22:54:21.7977305Z File "", line 1, in 2023-01-11T22:54:21.7977517Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7977660Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7977863Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7978015Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7978212Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7978320Z self.run() 2023-01-11T22:54:21.7978527Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7978675Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7979015Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7979149Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7979516Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7979641Z getattr(self, test_name)() 2023-01-11T22:54:21.7979985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7980086Z fn() 2023-01-11T22:54:21.7980454Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7980581Z test(self, **param_kwargs) 2023-01-11T22:54:21.7981048Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7981174Z return func(*args, **kwargs) 2023-01-11T22:54:21.7981473Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7981589Z self.run_subtests( 2023-01-11T22:54:21.7981987Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7982161Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7982531Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7982685Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7983061Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7983186Z output = model(*input) 2023-01-11T22:54:21.7983512Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7983647Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7984007Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7984183Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7984550Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7984670Z _lazy_init(state, module) 2023-01-11T22:54:21.7985025Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7985195Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7985592Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7985741Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7986061Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7986188Z return func(*args, **kwargs) 2023-01-11T22:54:21.7986569Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7986677Z p_assert( 2023-01-11T22:54:21.7987014Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7987143Z traceback.print_stack() 2023-01-11T22:54:21.7987272Z File "", line 1, in 2023-01-11T22:54:21.7987481Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7987607Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7987813Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7987962Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7988176Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7988282Z self.run() 2023-01-11T22:54:21.7988484Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7988633Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7988960Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7989091Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7989475Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7989596Z getattr(self, test_name)() 2023-01-11T22:54:21.7989960Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.7990116Z fn() 2023-01-11T22:54:21.7990485Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.7990606Z test(self, **param_kwargs) 2023-01-11T22:54:21.7990943Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.7991114Z return func(*args, **kwargs) 2023-01-11T22:54:21.7991423Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.7991538Z self.run_subtests( 2023-01-11T22:54:21.7991894Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.7992057Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.7992423Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.7992581Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.7992942Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.7993063Z output = model(*input) 2023-01-11T22:54:21.7993394Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.7993535Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.7993914Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.7994090Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.7994462Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.7994583Z _lazy_init(state, module) 2023-01-11T22:54:21.7994924Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.7995095Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.7995494Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.7995638Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.7995979Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.7996107Z return func(*args, **kwargs) 2023-01-11T22:54:21.7996485Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.7996585Z p_assert( 2023-01-11T22:54:21.7996905Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.7997039Z traceback.print_stack() 2023-01-11T22:54:21.7997171Z File "", line 1, in 2023-01-11T22:54:21.7997381Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.7997528Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.7997730Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.7997883Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.7998099Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.7998187Z self.run() 2023-01-11T22:54:21.7998385Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.7998530Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.7998873Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.7999006Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.7999439Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.7999562Z getattr(self, test_name)() 2023-01-11T22:54:21.7999921Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8000003Z fn() 2023-01-11T22:54:21.8000414Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8000546Z test(self, **param_kwargs) 2023-01-11T22:54:21.8000910Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8001036Z return func(*args, **kwargs) 2023-01-11T22:54:21.8001333Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.8001449Z self.run_subtests( 2023-01-11T22:54:21.8001801Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8001947Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8002312Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8002467Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8002847Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8002970Z output = model(*input) 2023-01-11T22:54:21.8003296Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8003435Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8003811Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8003974Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8004343Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8004461Z _lazy_init(state, module) 2023-01-11T22:54:21.8004819Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8004992Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8005389Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8005532Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8005868Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8005976Z return func(*args, **kwargs) 2023-01-11T22:54:21.8006358Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8006460Z p_assert( 2023-01-11T22:54:21.8006798Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8006924Z traceback.print_stack() 2023-01-11T22:54:21.8007675Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8008421Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8008604Z File "", line 1, in 2023-01-11T22:54:21.8008813Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8008940Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8009138Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8009287Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8009541Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8009648Z self.run() 2023-01-11T22:54:21.8009851Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8010000Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8010347Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8010464Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8010833Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8010958Z getattr(self, test_name)() 2023-01-11T22:54:21.8011319Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8011417Z fn() 2023-01-11T22:54:21.8011784Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8011907Z test(self, **param_kwargs) 2023-01-11T22:54:21.8012262Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8012370Z return func(*args, **kwargs) 2023-01-11T22:54:21.8012669Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.8012781Z self.run_subtests( 2023-01-11T22:54:21.8013279Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8013443Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8013812Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8013963Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8014343Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8014447Z output = model(*input) 2023-01-11T22:54:21.8014774Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8014912Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8015287Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8015464Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8015829Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8015948Z _lazy_init(state, module) 2023-01-11T22:54:21.8016296Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8016452Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8016844Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8016988Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8017324Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8017449Z return func(*args, **kwargs) 2023-01-11T22:54:21.8017825Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8018009Z p_assert( 2023-01-11T22:54:21.8018350Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8018461Z traceback.print_stack() 2023-01-11T22:54:21.8018591Z File "", line 1, in 2023-01-11T22:54:21.8018802Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8019008Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8019218Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8019371Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8019584Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8019683Z self.run() 2023-01-11T22:54:21.8019869Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8020016Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8020362Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8020493Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8020858Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8020980Z getattr(self, test_name)() 2023-01-11T22:54:21.8021339Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8021421Z fn() 2023-01-11T22:54:21.8021785Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8021906Z test(self, **param_kwargs) 2023-01-11T22:54:21.8022259Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8022387Z return func(*args, **kwargs) 2023-01-11T22:54:21.8022677Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.8022786Z self.run_subtests( 2023-01-11T22:54:21.8023136Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8023285Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8023649Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8023798Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8024171Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8024284Z output = model(*input) 2023-01-11T22:54:21.8024606Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8024747Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8025121Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8025294Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8025648Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8025763Z _lazy_init(state, module) 2023-01-11T22:54:21.8026117Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8026286Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8026682Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8026883Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8027222Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8027344Z return func(*args, **kwargs) 2023-01-11T22:54:21.8027701Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8027802Z p_assert( 2023-01-11T22:54:21.8028179Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8028311Z traceback.print_stack() 2023-01-11T22:54:21.8028441Z File "", line 1, in 2023-01-11T22:54:21.8028647Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8028787Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8028974Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8029123Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8029342Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8029447Z self.run() 2023-01-11T22:54:21.8029649Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8029794Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8030140Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8030274Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8030620Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8030742Z getattr(self, test_name)() 2023-01-11T22:54:21.8031103Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8031199Z fn() 2023-01-11T22:54:21.8031559Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8031688Z test(self, **param_kwargs) 2023-01-11T22:54:21.8032045Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8032169Z return func(*args, **kwargs) 2023-01-11T22:54:21.8032452Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.8032563Z self.run_subtests( 2023-01-11T22:54:21.8032916Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8033073Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8033437Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8033590Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8033971Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8034092Z output = model(*input) 2023-01-11T22:54:21.8034402Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8034540Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8034923Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8035099Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8035459Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8035577Z _lazy_init(state, module) 2023-01-11T22:54:21.8035926Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8036153Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8036541Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8036680Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8037017Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8037221Z return func(*args, **kwargs) 2023-01-11T22:54:21.8037614Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8037717Z p_assert( 2023-01-11T22:54:21.8038049Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8038173Z traceback.print_stack() 2023-01-11T22:54:21.8038287Z File "", line 1, in 2023-01-11T22:54:21.8038500Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8038642Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8038841Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8038986Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8039192Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8039291Z self.run() 2023-01-11T22:54:21.8039492Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8039623Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8039964Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8040093Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8040455Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8040584Z getattr(self, test_name)() 2023-01-11T22:54:21.8040944Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8041039Z fn() 2023-01-11T22:54:21.8041388Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8041508Z test(self, **param_kwargs) 2023-01-11T22:54:21.8041868Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8041994Z return func(*args, **kwargs) 2023-01-11T22:54:21.8042290Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.8042402Z self.run_subtests( 2023-01-11T22:54:21.8042751Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8042915Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8043278Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8043416Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8043792Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8043908Z output = model(*input) 2023-01-11T22:54:21.8044232Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8044372Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8044745Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8044918Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8045282Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8045463Z _lazy_init(state, module) 2023-01-11T22:54:21.8045820Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8045986Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8046432Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8046578Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8046924Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8047048Z return func(*args, **kwargs) 2023-01-11T22:54:21.8047424Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8047516Z p_assert( 2023-01-11T22:54:21.8047851Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8047976Z traceback.print_stack() 2023-01-11T22:54:21.8048105Z File "", line 1, in 2023-01-11T22:54:21.8048312Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8048451Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8048655Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8048791Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8049002Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8049108Z self.run() 2023-01-11T22:54:21.8049310Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8049456Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8049797Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8049936Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8050297Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8050405Z getattr(self, test_name)() 2023-01-11T22:54:21.8050763Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8050855Z fn() 2023-01-11T22:54:21.8051214Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8051335Z test(self, **param_kwargs) 2023-01-11T22:54:21.8051689Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8051812Z return func(*args, **kwargs) 2023-01-11T22:54:21.8052102Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.8052204Z self.run_subtests( 2023-01-11T22:54:21.8052551Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8052711Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8053318Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8053475Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8053855Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8054009Z output = model(*input) 2023-01-11T22:54:21.8054336Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8054458Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8054934Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8055108Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8055472Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8055589Z _lazy_init(state, module) 2023-01-11T22:54:21.8055998Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8056174Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8056571Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8056700Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8057033Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8057161Z return func(*args, **kwargs) 2023-01-11T22:54:21.8057532Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8057636Z p_assert( 2023-01-11T22:54:21.8057971Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8058096Z traceback.print_stack() 2023-01-11T22:54:21.8058225Z File "", line 1, in 2023-01-11T22:54:21.8058419Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8058566Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8058771Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8058921Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8059130Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8059242Z self.run() 2023-01-11T22:54:21.8059440Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8059584Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8059912Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8060045Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8060413Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8060529Z getattr(self, test_name)() 2023-01-11T22:54:21.8060885Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8060983Z fn() 2023-01-11T22:54:21.8061343Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8061456Z test(self, **param_kwargs) 2023-01-11T22:54:21.8061805Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8061928Z return func(*args, **kwargs) 2023-01-11T22:54:21.8062221Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 128, in test_nested_wrapped_model_single_iteration_mixed_precision 2023-01-11T22:54:21.8062335Z self.run_subtests( 2023-01-11T22:54:21.8062685Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8062849Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8063212Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8063367Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8063718Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8063900Z output = model(*input) 2023-01-11T22:54:21.8064231Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8064367Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8064743Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8064967Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8065352Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8065469Z _lazy_init(state, module) 2023-01-11T22:54:21.8065803Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8065967Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8066364Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8066509Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8066842Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8066965Z return func(*args, **kwargs) 2023-01-11T22:54:21.8067336Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8067435Z p_assert( 2023-01-11T22:54:21.8067753Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8067878Z traceback.print_stack() 2023-01-11T22:54:21.8068624Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8069371Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8070115Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8070856Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8071589Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8072321Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8073054Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8073850Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8074922Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:237: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.8075063Z (rank, world_num_valid_indices[rank]) 2023-01-11T22:54:21.8076099Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.8076211Z world_indices[ 2023-01-11T22:54:21.8077225Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_exec_order_utils.py:259: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.8077331Z world_indices[ 2023-01-11T22:54:21.8077441Z dist init r=0, world=2 2023-01-11T22:54:21.8077766Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8078084Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8078380Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8078685Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8078990Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8079288Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8079587Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8079890Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8080191Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8080490Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8080790Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8081148Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8081260Z dist init r=1, world=2 2023-01-11T22:54:21.8081621Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8081925Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8082231Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8082533Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8082838Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8083139Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8083434Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8083734Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8084027Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8084327Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8084624Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8084923Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8085011Z ok (5.114s) 2023-01-11T22:54:21.8085336Z test_transformer_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89445 2023-01-11T22:54:21.8085556Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89446 2023-01-11T22:54:21.8085943Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.8086120Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.8086499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.8086684Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.8087047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.8087222Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.8087588Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.8087774Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.8088020Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.8088322Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.8088730Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.8089121Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.8089390Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.8089625Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.8089862Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8090104Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8091599Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.8091735Z warnings.warn( 2023-01-11T22:54:21.8092765Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.8093007Z warnings.warn( 2023-01-11T22:54:21.8093253Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8093482Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8094239Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8094981Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8095719Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8096467Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8096704Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8096938Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8097168Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8097384Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8098244Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8099040Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8099794Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8100539Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8100775Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8101011Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8101238Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8101474Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8102206Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8102943Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8103681Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8104413Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8104651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8104882Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8105110Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8105324Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8106068Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8106796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8107652Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8108396Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8109120Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8109853Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8110578Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8111302Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8112035Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8112760Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8113481Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8114208Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8114940Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8115659Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8115945Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8116174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8116443Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8116677Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8116903Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8117126Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8117864Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8118598Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8119330Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8120061Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8120789Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8121501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8122228Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8122964Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8123691Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8124418Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8125264Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8126004Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8126729Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8127459Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8127688Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8127914Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8128140Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8128370Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8129351Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:795: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.8129531Z return torch._VF.split_with_sizes(self, split_size, dim) 2023-01-11T22:54:21.8130498Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:795: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.8130670Z return torch._VF.split_with_sizes(self, split_size, dim) 2023-01-11T22:54:21.8130898Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8131120Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8131335Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8131562Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8132304Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8133191Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8134096Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8134841Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8135576Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8136307Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8137029Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8137754Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8138493Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8139218Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8139945Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8140677Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8141403Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8142125Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8142417Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8142651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8143440Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8144172Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8144900Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8145625Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8146351Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8147082Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8147808Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8148529Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8149256Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8149980Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8150705Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8151492Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8152258Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8152990Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8153718Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8154440Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8155162Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8155888Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8156619Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8157346Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8158072Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8158796Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8159519Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8160303Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8161069Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8161802Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8162530Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8163251Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8163975Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8164702Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8165428Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8166151Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8166874Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8167601Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8168327Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8169125Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8169895Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8170634Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8170858Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8171090Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8171322Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8171557Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8172293Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8173162Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8173908Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8174641Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8174876Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8175109Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8175346Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8175578Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8175791Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8176018Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8176246Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8176473Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8177208Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8178046Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8178829Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8179571Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8180306Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8181034Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8181759Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8182484Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8183213Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8183937Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8184668Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8185396Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8186125Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8186913Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8187148Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8187426Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8187660Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8187891Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8187987Z dist init r=1, world=2 2023-01-11T22:54:21.8188098Z dist init r=0, world=2 2023-01-11T22:54:21.8188200Z ok (8.521s) 2023-01-11T22:54:21.8188522Z test_transformer_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89528 2023-01-11T22:54:21.8188744Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89529 2023-01-11T22:54:21.8189120Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.8189294Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.8189661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.8189853Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.8190220Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.8190397Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.8190800Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.8190995Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.8191242Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.8191489Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.8191891Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.8192273Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.8192502Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.8192728Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.8192965Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8193196Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8194218Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.8194333Z warnings.warn( 2023-01-11T22:54:21.8195345Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.8195518Z warnings.warn( 2023-01-11T22:54:21.8195754Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8195986Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8196786Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8197544Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8198274Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8199014Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8199253Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8199488Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8199723Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8199956Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8200701Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8201438Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8202174Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8202914Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8203149Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8203383Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8203616Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8203849Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8204116Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8204344Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8205130Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8205872Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8206611Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8207354Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8207587Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8207822Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8208054Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8208286Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8208514Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8208741Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8208952Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8209183Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8209926Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8210662Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8211408Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8212144Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8212375Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8212659Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8213165Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8213406Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8213635Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8213923Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8214660Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8215391Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8216135Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8216865Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8217102Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8217338Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8217566Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8217795Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8218535Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8219268Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8220005Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8220739Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8221463Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8222272Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8223045Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8223787Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8224517Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8225249Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8225975Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8226703Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8226942Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8227179Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8227407Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8227620Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8227846Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8228072Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8229070Z /opt/conda/lib/python3.10/site-packages/torch/nn/parameter.py:55: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.8229310Z result = type(self)(self.data.clone(memory_format=torch.preserve_format), self.requires_grad) 2023-01-11T22:54:21.8230052Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8230786Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8231090Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8231320Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8231588Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8231824Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8232568Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8233308Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8234043Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8234772Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8234992Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8235225Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8235455Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8235686Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8235918Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8236144Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8236883Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8237625Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8238363Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8239100Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8239268Z dist init r=0, world=2 2023-01-11T22:54:21.8239380Z dist init r=1, world=2 2023-01-11T22:54:21.8239464Z ok (8.821s) 2023-01-11T22:54:21.8239797Z test_transformer_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89611 2023-01-11T22:54:21.8240017Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89612 2023-01-11T22:54:21.8240434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.8240615Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.8241003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.8241198Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.8241570Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.8241733Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.8242118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.8242308Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.8242559Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.8242807Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.8243207Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.8243605Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.8243835Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.8244063Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.8244281Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8244516Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8245540Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.8245659Z warnings.warn( 2023-01-11T22:54:21.8246679Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.8246797Z warnings.warn( 2023-01-11T22:54:21.8247034Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8247268Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8248012Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8248815Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8249594Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8250344Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8250586Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8250821Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8251054Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8251272Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8252016Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8252752Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8253931Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8254665Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8254898Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8255130Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8255367Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8255599Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8255829Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8256060Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8256800Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8257535Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8258413Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8259163Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8259381Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8259614Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8259848Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8260079Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8260307Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8260537Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8260766Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8260993Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8261736Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8262477Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8263218Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8263950Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8264169Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8264399Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8264630Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8264865Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8265095Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8265322Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8266059Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8266854Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8267628Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8268375Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8268615Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8268847Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8269077Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8269293Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8270031Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8270763Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8271503Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8272235Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8272964Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8273700Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8274426Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8275155Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8275981Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8276720Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8277450Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8278184Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8278422Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8278653Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8278884Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8279114Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8279348Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8279578Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8280577Z /opt/conda/lib/python3.10/site-packages/torch/nn/parameter.py:55: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:319.) 2023-01-11T22:54:21.8280800Z result = type(self)(self.data.clone(memory_format=torch.preserve_format), self.requires_grad) 2023-01-11T22:54:21.8281543Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8282284Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8282522Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8282755Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8282987Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8283216Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8283955Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8284798Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8285548Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8286281Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8286522Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8286755Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8286991Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8287205Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8287435Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8287659Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8288401Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8289136Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8289869Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8290604Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8290722Z dist init r=0, world=2 2023-01-11T22:54:21.8290835Z dist init r=1, world=2 2023-01-11T22:54:21.8290964Z ok (8.822s) 2023-01-11T22:54:21.8291293Z test_transformer_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89694 2023-01-11T22:54:21.8291515Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89695 2023-01-11T22:54:21.8291880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.8292058Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.8292439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.8292706Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.8293403Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.8293582Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.8294035Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.8294234Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.8294464Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.8294873Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.8295117Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.8295518Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.8295750Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.8295980Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.8296220Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8296458Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8297481Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.8297603Z warnings.warn( 2023-01-11T22:54:21.8298626Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.8298742Z warnings.warn( 2023-01-11T22:54:21.8298859Z File "", line 1, in 2023-01-11T22:54:21.8299076Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8299221Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8299426Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8299583Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8299801Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8299908Z self.run() 2023-01-11T22:54:21.8300095Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8300243Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8300593Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8300731Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8301097Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8301225Z getattr(self, test_name)() 2023-01-11T22:54:21.8301591Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8301766Z fn() 2023-01-11T22:54:21.8302127Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8302252Z test(self, **param_kwargs) 2023-01-11T22:54:21.8302607Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8302732Z return func(*args, **kwargs) 2023-01-11T22:54:21.8303020Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8303138Z self.run_subtests( 2023-01-11T22:54:21.8303500Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8303663Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8304011Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8304170Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8304547Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8304669Z output = model(*input) 2023-01-11T22:54:21.8304996Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8305141Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8305522Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8305701Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8306052Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8306176Z _lazy_init(state, module) 2023-01-11T22:54:21.8306533Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8306707Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8307109Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8307256Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8307598Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8307726Z return func(*args, **kwargs) 2023-01-11T22:54:21.8308087Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8308190Z p_assert( 2023-01-11T22:54:21.8308530Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8308658Z traceback.print_stack() 2023-01-11T22:54:21.8308791Z File "", line 1, in 2023-01-11T22:54:21.8309009Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8309155Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8309360Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8309495Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8309715Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8309821Z self.run() 2023-01-11T22:54:21.8310027Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8310172Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8310515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8310651Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8310995Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8311188Z getattr(self, test_name)() 2023-01-11T22:54:21.8311562Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8311662Z fn() 2023-01-11T22:54:21.8312030Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8312237Z test(self, **param_kwargs) 2023-01-11T22:54:21.8312609Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8312737Z return func(*args, **kwargs) 2023-01-11T22:54:21.8312959Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8313070Z self.run_subtests( 2023-01-11T22:54:21.8313428Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8313598Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8313964Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8314115Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8314492Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8314616Z output = model(*input) 2023-01-11T22:54:21.8314924Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8315064Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8315443Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8315620Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8315992Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8316112Z _lazy_init(state, module) 2023-01-11T22:54:21.8316468Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8316640Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8317023Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8317169Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8317509Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8317638Z return func(*args, **kwargs) 2023-01-11T22:54:21.8318016Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8318123Z p_assert( 2023-01-11T22:54:21.8318462Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8318590Z traceback.print_stack() 2023-01-11T22:54:21.8318810Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8319049Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8319805Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8320553Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8321367Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8322153Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8322901Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8323640Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8324374Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8325107Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8325842Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8326576Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8326711Z File "", line 1, in 2023-01-11T22:54:21.8326929Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8327073Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8327286Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8327438Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8327637Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8327742Z self.run() 2023-01-11T22:54:21.8327947Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8328100Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8328448Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8328586Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8328953Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8329078Z getattr(self, test_name)() 2023-01-11T22:54:21.8329422Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8329584Z fn() 2023-01-11T22:54:21.8329957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8330083Z test(self, **param_kwargs) 2023-01-11T22:54:21.8330440Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8330614Z return func(*args, **kwargs) 2023-01-11T22:54:21.8330861Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8330975Z self.run_subtests( 2023-01-11T22:54:21.8331316Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8331482Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8331847Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8332007Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8332384Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8332508Z output = model(*input) 2023-01-11T22:54:21.8332836Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8333349Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8333720Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8333895Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8334264Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8334387Z _lazy_init(state, module) 2023-01-11T22:54:21.8334750Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8334919Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8335321Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8335468Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8335793Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8335919Z return func(*args, **kwargs) 2023-01-11T22:54:21.8336296Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8336401Z p_assert( 2023-01-11T22:54:21.8336739Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8336873Z traceback.print_stack() 2023-01-11T22:54:21.8337005Z File "", line 1, in 2023-01-11T22:54:21.8337216Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8337343Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8337550Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8337703Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8337921Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8338027Z self.run() 2023-01-11T22:54:21.8338231Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8338379Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8338706Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8338841Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8339299Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8339425Z getattr(self, test_name)() 2023-01-11T22:54:21.8339789Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8339887Z fn() 2023-01-11T22:54:21.8340313Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8340446Z test(self, **param_kwargs) 2023-01-11T22:54:21.8340795Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8340921Z return func(*args, **kwargs) 2023-01-11T22:54:21.8341161Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8341276Z self.run_subtests( 2023-01-11T22:54:21.8341629Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8341795Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8342160Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8342316Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8342676Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8342799Z output = model(*input) 2023-01-11T22:54:21.8343127Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8343268Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8343646Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8343824Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8344198Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8344320Z _lazy_init(state, module) 2023-01-11T22:54:21.8344657Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8344828Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8345230Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8345375Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8345715Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8345842Z return func(*args, **kwargs) 2023-01-11T22:54:21.8346220Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8346329Z p_assert( 2023-01-11T22:54:21.8346650Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8346779Z traceback.print_stack() 2023-01-11T22:54:21.8347016Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8347257Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8347392Z File "", line 1, in 2023-01-11T22:54:21.8347602Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8347746Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8347950Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8348084Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8348297Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8348467Z self.run() 2023-01-11T22:54:21.8348671Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8348820Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8349165Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8349301Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8349713Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8349827Z getattr(self, test_name)() 2023-01-11T22:54:21.8350198Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8350299Z fn() 2023-01-11T22:54:21.8350666Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8350793Z test(self, **param_kwargs) 2023-01-11T22:54:21.8351150Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8351277Z return func(*args, **kwargs) 2023-01-11T22:54:21.8351516Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8351613Z self.run_subtests( 2023-01-11T22:54:21.8351971Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8352135Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8352502Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8352655Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8353034Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8353163Z output = model(*input) 2023-01-11T22:54:21.8353491Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8353614Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8353991Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8354169Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8354569Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8354687Z _lazy_init(state, module) 2023-01-11T22:54:21.8355038Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8355205Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8355604Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8355735Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8356077Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8356205Z return func(*args, **kwargs) 2023-01-11T22:54:21.8356588Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8356693Z p_assert( 2023-01-11T22:54:21.8357030Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8357163Z traceback.print_stack() 2023-01-11T22:54:21.8357294Z File "", line 1, in 2023-01-11T22:54:21.8357486Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8357630Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8357914Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8358071Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8358286Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8358393Z self.run() 2023-01-11T22:54:21.8358600Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8358771Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8359131Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8359266Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8359632Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8359757Z getattr(self, test_name)() 2023-01-11T22:54:21.8360114Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8360216Z fn() 2023-01-11T22:54:21.8360588Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8360695Z test(self, **param_kwargs) 2023-01-11T22:54:21.8361055Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8361186Z return func(*args, **kwargs) 2023-01-11T22:54:21.8361429Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8361544Z self.run_subtests( 2023-01-11T22:54:21.8361902Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8362067Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8362431Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8362572Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8362950Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8363073Z output = model(*input) 2023-01-11T22:54:21.8363400Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8363544Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8363928Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8364104Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8364475Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8364580Z _lazy_init(state, module) 2023-01-11T22:54:21.8364939Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8365111Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8365509Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8365654Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8366000Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8366127Z return func(*args, **kwargs) 2023-01-11T22:54:21.8366509Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8366596Z p_assert( 2023-01-11T22:54:21.8366933Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8367064Z traceback.print_stack() 2023-01-11T22:54:21.8367366Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8367605Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8367740Z File "", line 1, in 2023-01-11T22:54:21.8367952Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8368099Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8368324Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8368483Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8368698Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8368802Z self.run() 2023-01-11T22:54:21.8369007Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8369153Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8369502Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8369621Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8369985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8370109Z getattr(self, test_name)() 2023-01-11T22:54:21.8370478Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8370583Z fn() 2023-01-11T22:54:21.8370951Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8371078Z test(self, **param_kwargs) 2023-01-11T22:54:21.8371437Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8371548Z return func(*args, **kwargs) 2023-01-11T22:54:21.8371792Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8371905Z self.run_subtests( 2023-01-11T22:54:21.8372255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8372417Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8372784Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8373236Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8373622Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8373727Z output = model(*input) 2023-01-11T22:54:21.8374054Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8374194Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8374578Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8374753Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8375120Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8375242Z _lazy_init(state, module) 2023-01-11T22:54:21.8375601Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8375770Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8376151Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8376297Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8376635Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8376845Z return func(*args, **kwargs) 2023-01-11T22:54:21.8377228Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8377331Z p_assert( 2023-01-11T22:54:21.8377664Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8377833Z traceback.print_stack() 2023-01-11T22:54:21.8377973Z File "", line 1, in 2023-01-11T22:54:21.8378185Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8378327Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8378530Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8378681Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8378895Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8379006Z self.run() 2023-01-11T22:54:21.8379195Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8379345Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8379693Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8379828Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8380194Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8380321Z getattr(self, test_name)() 2023-01-11T22:54:21.8380683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8380782Z fn() 2023-01-11T22:54:21.8381132Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8381258Z test(self, **param_kwargs) 2023-01-11T22:54:21.8381615Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8381742Z return func(*args, **kwargs) 2023-01-11T22:54:21.8381984Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8382100Z self.run_subtests( 2023-01-11T22:54:21.8382454Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8382618Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8382964Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8383119Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8383493Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8383613Z output = model(*input) 2023-01-11T22:54:21.8383938Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8384079Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8384456Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8384637Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8384988Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8385112Z _lazy_init(state, module) 2023-01-11T22:54:21.8385466Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8385637Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8386036Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8386243Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8386590Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8386717Z return func(*args, **kwargs) 2023-01-11T22:54:21.8387126Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8387237Z p_assert( 2023-01-11T22:54:21.8387581Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8387709Z traceback.print_stack() 2023-01-11T22:54:21.8388461Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8389208Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8389953Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8390693Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8391433Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8392192Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8392925Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8393658Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8393901Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8394140Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8394273Z File "", line 1, in 2023-01-11T22:54:21.8394472Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8394617Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8394820Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8395031Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8395248Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8395354Z self.run() 2023-01-11T22:54:21.8395558Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8395704Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8396084Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8396223Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8396594Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8396719Z getattr(self, test_name)() 2023-01-11T22:54:21.8397081Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8397181Z fn() 2023-01-11T22:54:21.8397551Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8397657Z test(self, **param_kwargs) 2023-01-11T22:54:21.8398017Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8398145Z return func(*args, **kwargs) 2023-01-11T22:54:21.8398388Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8398504Z self.run_subtests( 2023-01-11T22:54:21.8398860Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8399024Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8399390Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8399528Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8399908Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8400031Z output = model(*input) 2023-01-11T22:54:21.8400354Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8400494Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8400873Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8401051Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8401421Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8401543Z _lazy_init(state, module) 2023-01-11T22:54:21.8401879Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8402053Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8402453Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8402599Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8402939Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8403069Z return func(*args, **kwargs) 2023-01-11T22:54:21.8403450Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8403556Z p_assert( 2023-01-11T22:54:21.8403877Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8404006Z traceback.print_stack() 2023-01-11T22:54:21.8404135Z File "", line 1, in 2023-01-11T22:54:21.8404347Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8404556Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8404758Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8404911Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8405107Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8405215Z self.run() 2023-01-11T22:54:21.8405461Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8405611Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8405958Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8406094Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8406460Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8406587Z getattr(self, test_name)() 2023-01-11T22:54:21.8406930Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8407031Z fn() 2023-01-11T22:54:21.8407405Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8407531Z test(self, **param_kwargs) 2023-01-11T22:54:21.8407893Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8408019Z return func(*args, **kwargs) 2023-01-11T22:54:21.8408260Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8408374Z self.run_subtests( 2023-01-11T22:54:21.8408711Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8408879Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8409245Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8409402Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8409779Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8409900Z output = model(*input) 2023-01-11T22:54:21.8410228Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8410366Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8410732Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8410910Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8411280Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8411408Z _lazy_init(state, module) 2023-01-11T22:54:21.8411763Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8411933Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8412335Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8412480Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8412800Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8413348Z return func(*args, **kwargs) 2023-01-11T22:54:21.8413742Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8413848Z p_assert( 2023-01-11T22:54:21.8414283Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8414411Z traceback.print_stack() 2023-01-11T22:54:21.8414652Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8414888Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8415001Z File "", line 1, in 2023-01-11T22:54:21.8415272Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8415431Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8415636Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8415789Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8416004Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8416111Z self.run() 2023-01-11T22:54:21.8416297Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8416451Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8416804Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8416940Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8417305Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8417434Z getattr(self, test_name)() 2023-01-11T22:54:21.8417797Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8417900Z fn() 2023-01-11T22:54:21.8418249Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8418374Z test(self, **param_kwargs) 2023-01-11T22:54:21.8418732Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8418862Z return func(*args, **kwargs) 2023-01-11T22:54:21.8419103Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8419221Z self.run_subtests( 2023-01-11T22:54:21.8419576Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8419743Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8420090Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8420244Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8420617Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8420739Z output = model(*input) 2023-01-11T22:54:21.8421064Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8421208Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8421590Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8421767Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8422119Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8422242Z _lazy_init(state, module) 2023-01-11T22:54:21.8422601Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8422773Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8423170Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8423387Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8423730Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8423858Z return func(*args, **kwargs) 2023-01-11T22:54:21.8424238Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8424325Z p_assert( 2023-01-11T22:54:21.8424710Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8424845Z traceback.print_stack() 2023-01-11T22:54:21.8424978Z File "", line 1, in 2023-01-11T22:54:21.8425194Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8425339Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8425543Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8425682Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8425895Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8426000Z self.run() 2023-01-11T22:54:21.8426206Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8426355Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8426707Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8426843Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8427208Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8427317Z getattr(self, test_name)() 2023-01-11T22:54:21.8427679Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8427779Z fn() 2023-01-11T22:54:21.8428148Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8428273Z test(self, **param_kwargs) 2023-01-11T22:54:21.8428632Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8428759Z return func(*args, **kwargs) 2023-01-11T22:54:21.8428999Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8429097Z self.run_subtests( 2023-01-11T22:54:21.8429451Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8429616Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8429985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8430140Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8430519Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8430641Z output = model(*input) 2023-01-11T22:54:21.8430967Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8431090Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8431474Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8431651Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8432019Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8432143Z _lazy_init(state, module) 2023-01-11T22:54:21.8432497Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8432729Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8433132Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8433261Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8433599Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8433775Z return func(*args, **kwargs) 2023-01-11T22:54:21.8434166Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8434271Z p_assert( 2023-01-11T22:54:21.8434609Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8434740Z traceback.print_stack() 2023-01-11T22:54:21.8434978Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8435201Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8435337Z File "", line 1, in 2023-01-11T22:54:21.8435549Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8435695Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8435898Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8436053Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8436267Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8436354Z self.run() 2023-01-11T22:54:21.8436559Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8436706Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8437049Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8437187Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8437549Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8437675Z getattr(self, test_name)() 2023-01-11T22:54:21.8438038Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8438120Z fn() 2023-01-11T22:54:21.8438490Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8438619Z test(self, **param_kwargs) 2023-01-11T22:54:21.8438979Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8439107Z return func(*args, **kwargs) 2023-01-11T22:54:21.8439348Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8439468Z self.run_subtests( 2023-01-11T22:54:21.8439820Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8439966Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8440335Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8440491Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8440868Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8440992Z output = model(*input) 2023-01-11T22:54:21.8441318Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8441463Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8441842Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8442062Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8442437Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8442562Z _lazy_init(state, module) 2023-01-11T22:54:21.8442919Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8443135Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8443548Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8443694Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8444033Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8444142Z return func(*args, **kwargs) 2023-01-11T22:54:21.8444521Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8444626Z p_assert( 2023-01-11T22:54:21.8444965Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8445094Z traceback.print_stack() 2023-01-11T22:54:21.8445226Z File "", line 1, in 2023-01-11T22:54:21.8445448Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8445593Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8445779Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8445929Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8446145Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8446252Z self.run() 2023-01-11T22:54:21.8446457Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8446608Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8446950Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8447066Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8447432Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8447561Z getattr(self, test_name)() 2023-01-11T22:54:21.8447925Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8448025Z fn() 2023-01-11T22:54:21.8448392Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8448517Z test(self, **param_kwargs) 2023-01-11T22:54:21.8448876Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8448988Z return func(*args, **kwargs) 2023-01-11T22:54:21.8449232Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8449346Z self.run_subtests( 2023-01-11T22:54:21.8449703Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8449871Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8450239Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8450393Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8450770Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8450873Z output = model(*input) 2023-01-11T22:54:21.8451201Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8451404Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8451787Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8451965Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8452375Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8452503Z _lazy_init(state, module) 2023-01-11T22:54:21.8453257Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8453439Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8453825Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8453974Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8454312Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8454439Z return func(*args, **kwargs) 2023-01-11T22:54:21.8454819Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8454924Z p_assert( 2023-01-11T22:54:21.8455264Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8455392Z traceback.print_stack() 2023-01-11T22:54:21.8456126Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8456878Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8457627Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8458365Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8459100Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8459841Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8460573Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8461306Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8462219Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8462965Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8463696Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8464435Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8465163Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8465888Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8466624Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8467352Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8467596Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8467839Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8467973Z File "", line 1, in 2023-01-11T22:54:21.8468189Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8468335Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8468545Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8468680Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8468895Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8469000Z self.run() 2023-01-11T22:54:21.8469204Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8469347Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8469691Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8469881Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8470253Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8470360Z getattr(self, test_name)() 2023-01-11T22:54:21.8470723Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8470868Z fn() 2023-01-11T22:54:21.8471249Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8471376Z test(self, **param_kwargs) 2023-01-11T22:54:21.8471736Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8471864Z return func(*args, **kwargs) 2023-01-11T22:54:21.8472086Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8472204Z self.run_subtests( 2023-01-11T22:54:21.8472563Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8472726Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8473094Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8473251Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8473628Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8473748Z output = model(*input) 2023-01-11T22:54:21.8474074Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8474197Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8474579Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8474761Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8475128Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8475249Z _lazy_init(state, module) 2023-01-11T22:54:21.8475606Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8475777Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8476174Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8476302Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8476641Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8476774Z return func(*args, **kwargs) 2023-01-11T22:54:21.8477157Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8477261Z p_assert( 2023-01-11T22:54:21.8477598Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8477727Z traceback.print_stack() 2023-01-11T22:54:21.8477863Z File "", line 1, in 2023-01-11T22:54:21.8478058Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8478199Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8478404Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8478557Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8478771Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8478935Z self.run() 2023-01-11T22:54:21.8479142Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8479272Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8479619Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8479752Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8480158Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8480290Z getattr(self, test_name)() 2023-01-11T22:54:21.8480655Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8480753Z fn() 2023-01-11T22:54:21.8481117Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8481225Z test(self, **param_kwargs) 2023-01-11T22:54:21.8481587Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8481713Z return func(*args, **kwargs) 2023-01-11T22:54:21.8481954Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8482069Z self.run_subtests( 2023-01-11T22:54:21.8482430Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8482594Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8482957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8483095Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8483470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8483591Z output = model(*input) 2023-01-11T22:54:21.8483917Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8484053Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8484429Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8484604Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8484978Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8485083Z _lazy_init(state, module) 2023-01-11T22:54:21.8485438Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8485606Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8486005Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8486155Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8486491Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8486616Z return func(*args, **kwargs) 2023-01-11T22:54:21.8486994Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8487084Z p_assert( 2023-01-11T22:54:21.8487423Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8487551Z traceback.print_stack() 2023-01-11T22:54:21.8487793Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8488031Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8488164Z File "", line 1, in 2023-01-11T22:54:21.8488441Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8488585Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8488771Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8488925Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8489139Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8489287Z self.run() 2023-01-11T22:54:21.8489498Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8489648Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8489998Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8490118Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8490483Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8490611Z getattr(self, test_name)() 2023-01-11T22:54:21.8490973Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8491070Z fn() 2023-01-11T22:54:21.8491439Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8491563Z test(self, **param_kwargs) 2023-01-11T22:54:21.8491925Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8492060Z return func(*args, **kwargs) 2023-01-11T22:54:21.8492302Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8492417Z self.run_subtests( 2023-01-11T22:54:21.8492773Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8493121Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8493496Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8493651Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8494027Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8494136Z output = model(*input) 2023-01-11T22:54:21.8494468Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8494612Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8494992Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8495168Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8495536Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8495663Z _lazy_init(state, module) 2023-01-11T22:54:21.8496017Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8496169Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8496572Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8496719Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8497057Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8497182Z return func(*args, **kwargs) 2023-01-11T22:54:21.8497564Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8497669Z p_assert( 2023-01-11T22:54:21.8498099Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8498210Z traceback.print_stack() 2023-01-11T22:54:21.8498341Z File "", line 1, in 2023-01-11T22:54:21.8498556Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8498702Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8498966Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8499129Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8499344Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8499452Z self.run() 2023-01-11T22:54:21.8499642Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8499794Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8500142Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8500282Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8500649Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8500774Z getattr(self, test_name)() 2023-01-11T22:54:21.8501135Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8501239Z fn() 2023-01-11T22:54:21.8501590Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8501715Z test(self, **param_kwargs) 2023-01-11T22:54:21.8502073Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8502195Z return func(*args, **kwargs) 2023-01-11T22:54:21.8502433Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8502553Z self.run_subtests( 2023-01-11T22:54:21.8502906Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8503070Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8503419Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8503577Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8503957Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8504078Z output = model(*input) 2023-01-11T22:54:21.8504405Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8504547Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8504928Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8505109Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8505464Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8505586Z _lazy_init(state, module) 2023-01-11T22:54:21.8505941Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8506112Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8506508Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8506653Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8506992Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8507182Z return func(*args, **kwargs) 2023-01-11T22:54:21.8507549Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8507652Z p_assert( 2023-01-11T22:54:21.8507989Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8508117Z traceback.print_stack() 2023-01-11T22:54:21.8508400Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8508644Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8508776Z File "", line 1, in 2023-01-11T22:54:21.8508987Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8509114Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8509316Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8509472Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8509686Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8509796Z self.run() 2023-01-11T22:54:21.8510003Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8510152Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8510486Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8510623Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8510986Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8511113Z getattr(self, test_name)() 2023-01-11T22:54:21.8511476Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8511577Z fn() 2023-01-11T22:54:21.8511949Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8512074Z test(self, **param_kwargs) 2023-01-11T22:54:21.8512415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8512544Z return func(*args, **kwargs) 2023-01-11T22:54:21.8512782Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8512895Z self.run_subtests( 2023-01-11T22:54:21.8513249Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8513410Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8513776Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8513933Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8514293Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8514415Z output = model(*input) 2023-01-11T22:54:21.8514744Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8514884Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8515266Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8515445Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8515817Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8515942Z _lazy_init(state, module) 2023-01-11T22:54:21.8516279Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8516507Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8516913Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8517054Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8517392Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8517567Z return func(*args, **kwargs) 2023-01-11T22:54:21.8517961Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8518065Z p_assert( 2023-01-11T22:54:21.8518382Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8518514Z traceback.print_stack() 2023-01-11T22:54:21.8518646Z File "", line 1, in 2023-01-11T22:54:21.8518864Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8519009Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8519214Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8519367Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8519583Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8519671Z self.run() 2023-01-11T22:54:21.8519879Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8520025Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8520371Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8520506Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8520864Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8520991Z getattr(self, test_name)() 2023-01-11T22:54:21.8521352Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8521434Z fn() 2023-01-11T22:54:21.8521802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8521928Z test(self, **param_kwargs) 2023-01-11T22:54:21.8522287Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8522414Z return func(*args, **kwargs) 2023-01-11T22:54:21.8522657Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8522770Z self.run_subtests( 2023-01-11T22:54:21.8523105Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8523270Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8523636Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8523791Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8524167Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8524288Z output = model(*input) 2023-01-11T22:54:21.8524616Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8524755Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8525115Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8525290Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8525657Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8525839Z _lazy_init(state, module) 2023-01-11T22:54:21.8526195Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8526362Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8526806Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8526954Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8527293Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8527403Z return func(*args, **kwargs) 2023-01-11T22:54:21.8527778Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8527881Z p_assert( 2023-01-11T22:54:21.8528221Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8528350Z traceback.print_stack() 2023-01-11T22:54:21.8529104Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8529850Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8530593Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8531340Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8532076Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8532805Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8533755Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8534492Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8534733Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8534971Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8535167Z File "", line 1, in 2023-01-11T22:54:21.8535383Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8535529Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8535735Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8535886Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8536152Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8536264Z self.run() 2023-01-11T22:54:21.8536450Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8536599Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8536954Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8537088Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8537459Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8537583Z getattr(self, test_name)() 2023-01-11T22:54:21.8537943Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8538041Z fn() 2023-01-11T22:54:21.8538391Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8538518Z test(self, **param_kwargs) 2023-01-11T22:54:21.8538871Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8539001Z return func(*args, **kwargs) 2023-01-11T22:54:21.8539242Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8539357Z self.run_subtests( 2023-01-11T22:54:21.8539716Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8539882Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8540229Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8540385Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8540763Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8540885Z output = model(*input) 2023-01-11T22:54:21.8541210Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8541351Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8541736Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8541917Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8542269Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8542394Z _lazy_init(state, module) 2023-01-11T22:54:21.8542747Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8542920Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8543317Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8543464Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8543802Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8543929Z return func(*args, **kwargs) 2023-01-11T22:54:21.8544307Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8544460Z p_assert( 2023-01-11T22:54:21.8544805Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8544932Z traceback.print_stack() 2023-01-11T22:54:21.8545067Z File "", line 1, in 2023-01-11T22:54:21.8545279Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8545471Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8545680Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8545815Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8546030Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8546136Z self.run() 2023-01-11T22:54:21.8546343Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8546496Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8546842Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8546976Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8547339Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8547446Z getattr(self, test_name)() 2023-01-11T22:54:21.8547811Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8547912Z fn() 2023-01-11T22:54:21.8548282Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8548407Z test(self, **param_kwargs) 2023-01-11T22:54:21.8548767Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8548896Z return func(*args, **kwargs) 2023-01-11T22:54:21.8549133Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8549229Z self.run_subtests( 2023-01-11T22:54:21.8549583Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8549750Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8550117Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8550272Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8550651Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8550771Z output = model(*input) 2023-01-11T22:54:21.8551097Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8551223Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8551602Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8551780Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8552146Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8552272Z _lazy_init(state, module) 2023-01-11T22:54:21.8552623Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8552782Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8553181Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8553323Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8553743Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8553869Z return func(*args, **kwargs) 2023-01-11T22:54:21.8554247Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8554349Z p_assert( 2023-01-11T22:54:21.8554716Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8554847Z traceback.print_stack() 2023-01-11T22:54:21.8555082Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8555318Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8555451Z File "", line 1, in 2023-01-11T22:54:21.8555663Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8555811Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8555997Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8556147Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8556361Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8556464Z self.run() 2023-01-11T22:54:21.8556666Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8556816Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8557165Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8557302Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8557649Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8557775Z getattr(self, test_name)() 2023-01-11T22:54:21.8558137Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8558241Z fn() 2023-01-11T22:54:21.8558609Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8558732Z test(self, **param_kwargs) 2023-01-11T22:54:21.8559090Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8559221Z return func(*args, **kwargs) 2023-01-11T22:54:21.8559447Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8559555Z self.run_subtests( 2023-01-11T22:54:21.8559909Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8560070Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8560433Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8560597Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8560974Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8561096Z output = model(*input) 2023-01-11T22:54:21.8561409Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8561552Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8561930Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8562104Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8562472Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8562593Z _lazy_init(state, module) 2023-01-11T22:54:21.8563017Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8563187Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8563566Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8563706Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8564095Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8564229Z return func(*args, **kwargs) 2023-01-11T22:54:21.8564611Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8564715Z p_assert( 2023-01-11T22:54:21.8565052Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8565182Z traceback.print_stack() 2023-01-11T22:54:21.8565296Z File "", line 1, in 2023-01-11T22:54:21.8565507Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8565649Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8565849Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8566001Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8566218Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8566324Z self.run() 2023-01-11T22:54:21.8566511Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8566660Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8567000Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8567129Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8567496Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8567621Z getattr(self, test_name)() 2023-01-11T22:54:21.8567978Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8568075Z fn() 2023-01-11T22:54:21.8568427Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8568554Z test(self, **param_kwargs) 2023-01-11T22:54:21.8568908Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8569034Z return func(*args, **kwargs) 2023-01-11T22:54:21.8569274Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8569391Z self.run_subtests( 2023-01-11T22:54:21.8569750Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8569914Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8570262Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8570415Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8570796Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8570918Z output = model(*input) 2023-01-11T22:54:21.8571241Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8571378Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8571755Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8571991Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8572348Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8572470Z _lazy_init(state, module) 2023-01-11T22:54:21.8572820Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8573195Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8573618Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8573764Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8574099Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8574223Z return func(*args, **kwargs) 2023-01-11T22:54:21.8574582Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8574692Z p_assert( 2023-01-11T22:54:21.8575029Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8575156Z traceback.print_stack() 2023-01-11T22:54:21.8575395Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8575637Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8576388Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8577129Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8577877Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8578613Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8579346Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8580085Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8580823Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8581552Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8582418Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8583159Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8583891Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8584627Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8584869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8585087Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8585321Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8585555Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8585787Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8586014Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8586752Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8587485Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8588222Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8588964Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8589691Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8590421Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8591263Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8592005Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8592760Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8593500Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8594231Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8594960Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8595696Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8596424Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8596664Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8596902Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8597134Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8597348Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8598091Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8598823Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8599610Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8600419Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8601165Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8601896Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8602630Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8603354Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8604081Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8604815Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8605543Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8606268Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8607000Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8607723Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8608446Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8609276Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8610011Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8610733Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8611465Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8612186Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8613104Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8613850Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8614582Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8615305Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8615546Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8615784Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8616019Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8616251Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8616981Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8617810Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8618597Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8619346Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8620079Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8620805Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8621532Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8622260Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8622989Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8623714Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8624441Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8625150Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8625879Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8626672Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8626952Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8627191Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8627422Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8627653Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8627880Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8628107Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8628852Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8629585Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8630316Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8631052Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8631779Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8632510Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8633245Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8633973Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8634699Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8635486Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8636256Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8636994Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8637721Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8638448Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8638685Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8638903Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8639135Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8639369Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8640108Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8640840Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8641568Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8642298Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8643026Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8643749Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8644586Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8645327Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8646057Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8646789Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8647515Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8648240Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8648967Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8649695Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8649931Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8650165Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8650400Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8650630Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8650842Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8651073Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8651812Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8652543Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8653593Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8654346Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8655105Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8655841Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8656572Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8657300Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8658036Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8658756Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8659477Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8660206Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8660935Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8661658Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8661964Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8662196Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8662472Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8662707Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8663454Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8664184Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8664923Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8665649Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8665880Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8666100Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8666330Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8666560Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8667299Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8668030Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8668763Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8669494Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8670218Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8671008Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8671779Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8672515Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8673244Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8673973Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8674695Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8675421Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8676151Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8676871Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8677112Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8677347Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8677580Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8677815Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8678028Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8678256Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8678992Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8679802Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8680581Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8681326Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8682060Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8682787Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8683514Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8684239Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8684966Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8685690Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8686424Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8687149Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8687880Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8688663Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8688942Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8689181Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8689294Z dist init r=1, world=2 2023-01-11T22:54:21.8689627Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8689947Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8690256Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8690544Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8690848Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8691147Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8691446Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8691747Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8692046Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8692348Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8692646Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8693134Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.8693257Z dist init r=0, world=2 2023-01-11T22:54:21.8693583Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8693904Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8694198Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8694505Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8694810Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8695110Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8695487Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8695789Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8696140Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8696449Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8696752Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8697059Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.8697159Z ok (10.024s) 2023-01-11T22:54:21.8697458Z test_transformer_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89777 2023-01-11T22:54:21.8697681Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89778 2023-01-11T22:54:21.8698071Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.8698250Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.8698630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.8698820Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.8699194Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.8699372Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.8699750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.8699924Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.8700176Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.8700422Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.8700824Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.8701223Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.8701460Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.8701687Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.8701922Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8702160Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8703166Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.8703344Z warnings.warn( 2023-01-11T22:54:21.8704417Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.8704537Z warnings.warn( 2023-01-11T22:54:21.8704668Z File "", line 1, in 2023-01-11T22:54:21.8704882Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8705025Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8705230Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8705382Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8705602Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8705691Z self.run() 2023-01-11T22:54:21.8705894Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8706045Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8706400Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8706536Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8706905Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8707028Z getattr(self, test_name)() 2023-01-11T22:54:21.8707374Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8707472Z fn() 2023-01-11T22:54:21.8707838Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8707966Z test(self, **param_kwargs) 2023-01-11T22:54:21.8708327Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8708453Z return func(*args, **kwargs) 2023-01-11T22:54:21.8708690Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8708804Z self.run_subtests( 2023-01-11T22:54:21.8709147Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8709309Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8709677Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8709828Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8710206Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8710330Z output = model(*input) 2023-01-11T22:54:21.8710655Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8710794Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8711154Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8711334Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8711703Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8711823Z _lazy_init(state, module) 2023-01-11T22:54:21.8712175Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8712345Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8712822Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8712964Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8713302Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8713412Z return func(*args, **kwargs) 2023-01-11T22:54:21.8713833Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8713943Z p_assert( 2023-01-11T22:54:21.8714286Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8714412Z traceback.print_stack() 2023-01-11T22:54:21.8714541Z File "", line 1, in 2023-01-11T22:54:21.8714751Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8714884Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8715088Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8715238Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8715451Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8715557Z self.run() 2023-01-11T22:54:21.8715759Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8715909Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8716253Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8716371Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8716735Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8716862Z getattr(self, test_name)() 2023-01-11T22:54:21.8717222Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8717323Z fn() 2023-01-11T22:54:21.8717689Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8717813Z test(self, **param_kwargs) 2023-01-11T22:54:21.8718164Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8718277Z return func(*args, **kwargs) 2023-01-11T22:54:21.8718520Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8718637Z self.run_subtests( 2023-01-11T22:54:21.8718992Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8719156Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8719524Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8719680Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8720058Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8720162Z output = model(*input) 2023-01-11T22:54:21.8720491Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8720628Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8721005Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8721179Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8721542Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8721731Z _lazy_init(state, module) 2023-01-11T22:54:21.8722095Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8722248Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8722646Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8722790Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8723178Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8723306Z return func(*args, **kwargs) 2023-01-11T22:54:21.8723693Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8723797Z p_assert( 2023-01-11T22:54:21.8724135Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8724251Z traceback.print_stack() 2023-01-11T22:54:21.8724489Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8724724Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8725481Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8726220Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8726960Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8727705Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8728444Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8729174Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8729916Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8730645Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8731377Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8732215Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8732354Z File "", line 1, in 2023-01-11T22:54:21.8732567Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8732714Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8733105Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8733247Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8733469Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8733575Z self.run() 2023-01-11T22:54:21.8733782Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8733932Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8734288Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8734423Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8734770Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8734893Z getattr(self, test_name)() 2023-01-11T22:54:21.8735252Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8735350Z fn() 2023-01-11T22:54:21.8735718Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8735844Z test(self, **param_kwargs) 2023-01-11T22:54:21.8736202Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8736329Z return func(*args, **kwargs) 2023-01-11T22:54:21.8736552Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8736669Z self.run_subtests( 2023-01-11T22:54:21.8737026Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8737190Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8737554Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8737705Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8738090Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8738210Z output = model(*input) 2023-01-11T22:54:21.8738520Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8738659Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8739037Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8739212Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8739582Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8739705Z _lazy_init(state, module) 2023-01-11T22:54:21.8740059Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8740359Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8740770Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8740898Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8741235Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8741363Z return func(*args, **kwargs) 2023-01-11T22:54:21.8741808Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8741920Z p_assert( 2023-01-11T22:54:21.8742262Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8742389Z traceback.print_stack() 2023-01-11T22:54:21.8742503Z File "", line 1, in 2023-01-11T22:54:21.8742714Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8742862Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8743065Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8743218Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8743432Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8743535Z self.run() 2023-01-11T22:54:21.8743740Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8743870Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8744210Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8744344Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8744706Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8744833Z getattr(self, test_name)() 2023-01-11T22:54:21.8745194Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8745292Z fn() 2023-01-11T22:54:21.8745655Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8745761Z test(self, **param_kwargs) 2023-01-11T22:54:21.8746119Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8746245Z return func(*args, **kwargs) 2023-01-11T22:54:21.8746485Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8746598Z self.run_subtests( 2023-01-11T22:54:21.8746951Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8747112Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8747481Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8747619Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8747996Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8748117Z output = model(*input) 2023-01-11T22:54:21.8748443Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8748581Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8748962Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8749135Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8749501Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8749671Z _lazy_init(state, module) 2023-01-11T22:54:21.8750035Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8750206Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8750604Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8750792Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8751141Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8751264Z return func(*args, **kwargs) 2023-01-11T22:54:21.8751641Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8751728Z p_assert( 2023-01-11T22:54:21.8752064Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8752194Z traceback.print_stack() 2023-01-11T22:54:21.8752433Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8752670Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8752801Z File "", line 1, in 2023-01-11T22:54:21.8753016Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8753159Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8753344Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8753495Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8753705Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8753809Z self.run() 2023-01-11T22:54:21.8754013Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8754164Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8754505Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8754621Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8754982Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8755110Z getattr(self, test_name)() 2023-01-11T22:54:21.8755472Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8755570Z fn() 2023-01-11T22:54:21.8755939Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8756062Z test(self, **param_kwargs) 2023-01-11T22:54:21.8756420Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8756533Z return func(*args, **kwargs) 2023-01-11T22:54:21.8756769Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8756881Z self.run_subtests( 2023-01-11T22:54:21.8757236Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8757402Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8757761Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8757915Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8758285Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8758389Z output = model(*input) 2023-01-11T22:54:21.8758709Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8758913Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8759299Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8759474Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8759886Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8760014Z _lazy_init(state, module) 2023-01-11T22:54:21.8760376Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8760529Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8760928Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8761073Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8761411Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8761538Z return func(*args, **kwargs) 2023-01-11T22:54:21.8761916Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8762020Z p_assert( 2023-01-11T22:54:21.8762358Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8762468Z traceback.print_stack() 2023-01-11T22:54:21.8762596Z File "", line 1, in 2023-01-11T22:54:21.8762807Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8762949Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8763148Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8763298Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8763512Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8763615Z self.run() 2023-01-11T22:54:21.8763800Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8763946Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8764292Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8764426Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8764787Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8764906Z getattr(self, test_name)() 2023-01-11T22:54:21.8765262Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8765359Z fn() 2023-01-11T22:54:21.8765708Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8765835Z test(self, **param_kwargs) 2023-01-11T22:54:21.8766195Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8766317Z return func(*args, **kwargs) 2023-01-11T22:54:21.8766556Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8766672Z self.run_subtests( 2023-01-11T22:54:21.8767025Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8767171Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8767533Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8767684Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8768130Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8768250Z output = model(*input) 2023-01-11T22:54:21.8768578Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8768720Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8769148Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8769331Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8769689Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8769810Z _lazy_init(state, module) 2023-01-11T22:54:21.8770162Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8770332Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8770729Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8770873Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8771206Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8771333Z return func(*args, **kwargs) 2023-01-11T22:54:21.8771696Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8771799Z p_assert( 2023-01-11T22:54:21.8772135Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8772264Z traceback.print_stack() 2023-01-11T22:54:21.8772502Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8772743Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8773046Z File "", line 1, in 2023-01-11T22:54:21.8773250Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8773394Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8773593Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8773747Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8773962Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8774061Z self.run() 2023-01-11T22:54:21.8774264Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8774410Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8774739Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8774871Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8775235Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8775358Z getattr(self, test_name)() 2023-01-11T22:54:21.8775719Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8775819Z fn() 2023-01-11T22:54:21.8776186Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8776309Z test(self, **param_kwargs) 2023-01-11T22:54:21.8776647Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8776774Z return func(*args, **kwargs) 2023-01-11T22:54:21.8777014Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8777210Z self.run_subtests( 2023-01-11T22:54:21.8777572Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8777734Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8778097Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8778246Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8778663Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8778794Z output = model(*input) 2023-01-11T22:54:21.8779124Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8779263Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8779640Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8779818Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8780186Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8780306Z _lazy_init(state, module) 2023-01-11T22:54:21.8780643Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8780815Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8781215Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8781358Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8781696Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8781821Z return func(*args, **kwargs) 2023-01-11T22:54:21.8782203Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8782307Z p_assert( 2023-01-11T22:54:21.8782627Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8782751Z traceback.print_stack() 2023-01-11T22:54:21.8782880Z File "", line 1, in 2023-01-11T22:54:21.8783093Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8783238Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8783441Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8783592Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8783803Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8783891Z self.run() 2023-01-11T22:54:21.8784093Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8784244Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8784586Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8784720Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8785082Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8785205Z getattr(self, test_name)() 2023-01-11T22:54:21.8785549Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8785650Z fn() 2023-01-11T22:54:21.8786019Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8786140Z test(self, **param_kwargs) 2023-01-11T22:54:21.8786497Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8786682Z return func(*args, **kwargs) 2023-01-11T22:54:21.8786921Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8787031Z self.run_subtests( 2023-01-11T22:54:21.8787371Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8787580Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8787958Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8788114Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8788489Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8788607Z output = model(*input) 2023-01-11T22:54:21.8788935Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8789080Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8789440Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8789615Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8789981Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8790100Z _lazy_init(state, module) 2023-01-11T22:54:21.8790452Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8790619Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8791013Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8791155Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8791494Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8791604Z return func(*args, **kwargs) 2023-01-11T22:54:21.8791979Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8792080Z p_assert( 2023-01-11T22:54:21.8792419Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8792545Z traceback.print_stack() 2023-01-11T22:54:21.8793297Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8794067Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8794815Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8795553Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8796289Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8797131Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8797878Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8798611Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8798856Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8799091Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8799208Z File "", line 1, in 2023-01-11T22:54:21.8799425Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8799570Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8799773Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8799923Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8800139Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8800248Z self.run() 2023-01-11T22:54:21.8800437Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8800586Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8800937Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8801072Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8801440Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8801565Z getattr(self, test_name)() 2023-01-11T22:54:21.8801926Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8802025Z fn() 2023-01-11T22:54:21.8802374Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8802503Z test(self, **param_kwargs) 2023-01-11T22:54:21.8802863Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8802989Z return func(*args, **kwargs) 2023-01-11T22:54:21.8803231Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8803344Z self.run_subtests( 2023-01-11T22:54:21.8803702Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8803865Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8804214Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8804368Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8804740Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8804935Z output = model(*input) 2023-01-11T22:54:21.8805268Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8805408Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8805786Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8806004Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8806367Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8806493Z _lazy_init(state, module) 2023-01-11T22:54:21.8806848Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8807017Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8807413Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8807556Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8807895Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8808022Z return func(*args, **kwargs) 2023-01-11T22:54:21.8808388Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8808495Z p_assert( 2023-01-11T22:54:21.8808834Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8808961Z traceback.print_stack() 2023-01-11T22:54:21.8809093Z File "", line 1, in 2023-01-11T22:54:21.8809301Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8809443Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8809649Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8809783Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8809995Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8810098Z self.run() 2023-01-11T22:54:21.8810304Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8810453Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8810798Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8810937Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8811282Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8811406Z getattr(self, test_name)() 2023-01-11T22:54:21.8811766Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8811868Z fn() 2023-01-11T22:54:21.8812233Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8812355Z test(self, **param_kwargs) 2023-01-11T22:54:21.8812714Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8812837Z return func(*args, **kwargs) 2023-01-11T22:54:21.8813254Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8813371Z self.run_subtests( 2023-01-11T22:54:21.8813730Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8813893Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8814257Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8814496Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8814878Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8815000Z output = model(*input) 2023-01-11T22:54:21.8815306Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8815505Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8815901Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8816078Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8816447Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8816567Z _lazy_init(state, module) 2023-01-11T22:54:21.8816928Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8817095Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8817492Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8817620Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8817961Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8818086Z return func(*args, **kwargs) 2023-01-11T22:54:21.8818461Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8818562Z p_assert( 2023-01-11T22:54:21.8818898Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8819027Z traceback.print_stack() 2023-01-11T22:54:21.8819253Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8819491Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8819621Z File "", line 1, in 2023-01-11T22:54:21.8819830Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8819975Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8820181Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8820331Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8820546Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8820634Z self.run() 2023-01-11T22:54:21.8820835Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8820982Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8821325Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8821458Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8821823Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8821946Z getattr(self, test_name)() 2023-01-11T22:54:21.8822310Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8822394Z fn() 2023-01-11T22:54:21.8822759Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8822881Z test(self, **param_kwargs) 2023-01-11T22:54:21.8823237Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8823363Z return func(*args, **kwargs) 2023-01-11T22:54:21.8823671Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8823785Z self.run_subtests( 2023-01-11T22:54:21.8824145Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8824292Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8824705Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8824867Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8825247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8825366Z output = model(*input) 2023-01-11T22:54:21.8825691Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8825829Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8826212Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8826371Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8826736Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8826857Z _lazy_init(state, module) 2023-01-11T22:54:21.8827218Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8827384Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8827780Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8827923Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8828258Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8828371Z return func(*args, **kwargs) 2023-01-11T22:54:21.8828751Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8828853Z p_assert( 2023-01-11T22:54:21.8829189Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8829314Z traceback.print_stack() 2023-01-11T22:54:21.8829447Z File "", line 1, in 2023-01-11T22:54:21.8829660Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8829805Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8829993Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8830147Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8830363Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8830473Z self.run() 2023-01-11T22:54:21.8830677Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8830824Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8831163Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8831280Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8831646Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8831772Z getattr(self, test_name)() 2023-01-11T22:54:21.8832133Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8832233Z fn() 2023-01-11T22:54:21.8832599Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8832787Z test(self, **param_kwargs) 2023-01-11T22:54:21.8833154Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8833263Z return func(*args, **kwargs) 2023-01-11T22:54:21.8833506Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8833621Z self.run_subtests( 2023-01-11T22:54:21.8834019Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8834188Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8834557Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8834712Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8835089Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8835198Z output = model(*input) 2023-01-11T22:54:21.8835525Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8835662Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8836044Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8836226Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8836594Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8836716Z _lazy_init(state, module) 2023-01-11T22:54:21.8837071Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8837223Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8837620Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8837769Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8838106Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8838230Z return func(*args, **kwargs) 2023-01-11T22:54:21.8838612Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8838714Z p_assert( 2023-01-11T22:54:21.8839048Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8839157Z traceback.print_stack() 2023-01-11T22:54:21.8839391Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8839628Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8839762Z File "", line 1, in 2023-01-11T22:54:21.8839972Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8840115Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8840318Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8840473Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8840673Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8840779Z self.run() 2023-01-11T22:54:21.8840982Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8841128Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8841473Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8841605Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8841971Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8842155Z getattr(self, test_name)() 2023-01-11T22:54:21.8842507Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8842605Z fn() 2023-01-11T22:54:21.8842972Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8843143Z test(self, **param_kwargs) 2023-01-11T22:54:21.8843516Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8843640Z return func(*args, **kwargs) 2023-01-11T22:54:21.8843880Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8843977Z self.run_subtests( 2023-01-11T22:54:21.8844326Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8844495Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8844861Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8845015Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8845390Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8845511Z output = model(*input) 2023-01-11T22:54:21.8845839Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8845976Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8846340Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8846516Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8846890Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8847011Z _lazy_init(state, module) 2023-01-11T22:54:21.8847365Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8847533Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8847932Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8848077Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8848395Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8848521Z return func(*args, **kwargs) 2023-01-11T22:54:21.8848898Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8849005Z p_assert( 2023-01-11T22:54:21.8849341Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8849470Z traceback.print_stack() 2023-01-11T22:54:21.8849600Z File "", line 1, in 2023-01-11T22:54:21.8849808Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8849935Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8850141Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8850295Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8850508Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8850612Z self.run() 2023-01-11T22:54:21.8850815Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8850962Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8851356Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8851490Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8851849Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8851972Z getattr(self, test_name)() 2023-01-11T22:54:21.8852380Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8852485Z fn() 2023-01-11T22:54:21.8853131Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8853269Z test(self, **param_kwargs) 2023-01-11T22:54:21.8853618Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8853742Z return func(*args, **kwargs) 2023-01-11T22:54:21.8853991Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8854104Z self.run_subtests( 2023-01-11T22:54:21.8854458Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8854621Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8854988Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8855143Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8855499Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8855618Z output = model(*input) 2023-01-11T22:54:21.8855944Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8856084Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8856467Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8856642Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8857010Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8857132Z _lazy_init(state, module) 2023-01-11T22:54:21.8857471Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8857641Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8858039Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8858182Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8858516Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8858645Z return func(*args, **kwargs) 2023-01-11T22:54:21.8859022Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8859126Z p_assert( 2023-01-11T22:54:21.8859445Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8859570Z traceback.print_stack() 2023-01-11T22:54:21.8860324Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8861065Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8861904Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8862700Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8863450Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8864185Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8864919Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8865652Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8866384Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8867114Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8867840Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8868573Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8869301Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8870029Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8870837Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8871608Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8871854Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8872090Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8872224Z File "", line 1, in 2023-01-11T22:54:21.8872439Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8872585Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8872775Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8872924Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8873144Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8873249Z self.run() 2023-01-11T22:54:21.8873453Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8873602Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8873954Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8874073Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8874443Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8874566Z getattr(self, test_name)() 2023-01-11T22:54:21.8874926Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8875026Z fn() 2023-01-11T22:54:21.8875394Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8875516Z test(self, **param_kwargs) 2023-01-11T22:54:21.8875877Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8875987Z return func(*args, **kwargs) 2023-01-11T22:54:21.8876225Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8876336Z self.run_subtests( 2023-01-11T22:54:21.8876692Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8876857Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8877224Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8877380Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8877756Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8877861Z output = model(*input) 2023-01-11T22:54:21.8878186Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8878326Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8878706Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8878881Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8879322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8879446Z _lazy_init(state, module) 2023-01-11T22:54:21.8879803Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8879956Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8880440Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8880593Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8880940Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8881069Z return func(*args, **kwargs) 2023-01-11T22:54:21.8881449Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8881558Z p_assert( 2023-01-11T22:54:21.8881897Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8882009Z traceback.print_stack() 2023-01-11T22:54:21.8882138Z File "", line 1, in 2023-01-11T22:54:21.8882349Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8882497Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8882701Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8882854Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8883065Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8883169Z self.run() 2023-01-11T22:54:21.8883355Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8883500Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8883843Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8883975Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8884340Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8884466Z getattr(self, test_name)() 2023-01-11T22:54:21.8884832Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8884914Z fn() 2023-01-11T22:54:21.8885280Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8885405Z test(self, **param_kwargs) 2023-01-11T22:54:21.8885762Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8885887Z return func(*args, **kwargs) 2023-01-11T22:54:21.8886131Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8886244Z self.run_subtests( 2023-01-11T22:54:21.8886601Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8886747Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8887117Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8887274Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8887653Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8887769Z output = model(*input) 2023-01-11T22:54:21.8888096Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8888298Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8888681Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8888840Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8889203Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8889369Z _lazy_init(state, module) 2023-01-11T22:54:21.8889734Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8889903Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8890300Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8890445Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8890780Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8890908Z return func(*args, **kwargs) 2023-01-11T22:54:21.8891267Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8891369Z p_assert( 2023-01-11T22:54:21.8891703Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8891831Z traceback.print_stack() 2023-01-11T22:54:21.8892068Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8892304Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8892436Z File "", line 1, in 2023-01-11T22:54:21.8892631Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8892774Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8893163Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8893323Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8893537Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8893643Z self.run() 2023-01-11T22:54:21.8893847Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8894018Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8894353Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8894489Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8894849Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8894970Z getattr(self, test_name)() 2023-01-11T22:54:21.8895324Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8903046Z fn() 2023-01-11T22:54:21.8903522Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8903653Z test(self, **param_kwargs) 2023-01-11T22:54:21.8904024Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8904160Z return func(*args, **kwargs) 2023-01-11T22:54:21.8904406Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8904504Z self.run_subtests( 2023-01-11T22:54:21.8904867Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8905032Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8905404Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8905705Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8906098Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8906220Z output = model(*input) 2023-01-11T22:54:21.8906550Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8906735Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8907137Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8907315Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8907691Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8907814Z _lazy_init(state, module) 2023-01-11T22:54:21.8908173Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8908344Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8908742Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8908886Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8909213Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8909338Z return func(*args, **kwargs) 2023-01-11T22:54:21.8909722Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8909823Z p_assert( 2023-01-11T22:54:21.8910161Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8910288Z traceback.print_stack() 2023-01-11T22:54:21.8910420Z File "", line 1, in 2023-01-11T22:54:21.8910616Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8910758Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8910962Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8911113Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8911326Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8911431Z self.run() 2023-01-11T22:54:21.8911636Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8911785Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8912110Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8912242Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8912612Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8912736Z getattr(self, test_name)() 2023-01-11T22:54:21.8913101Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8913202Z fn() 2023-01-11T22:54:21.8913570Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8913691Z test(self, **param_kwargs) 2023-01-11T22:54:21.8914032Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8914159Z return func(*args, **kwargs) 2023-01-11T22:54:21.8914399Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8914510Z self.run_subtests( 2023-01-11T22:54:21.8914865Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8915096Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8915470Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8915620Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8916025Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8916152Z output = model(*input) 2023-01-11T22:54:21.8916487Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8916628Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8917008Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8917186Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8917557Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8917676Z _lazy_init(state, module) 2023-01-11T22:54:21.8918012Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8918182Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8918583Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8918729Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8919070Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8919195Z return func(*args, **kwargs) 2023-01-11T22:54:21.8919571Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8919677Z p_assert( 2023-01-11T22:54:21.8920000Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8920128Z traceback.print_stack() 2023-01-11T22:54:21.8920369Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8920609Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8920739Z File "", line 1, in 2023-01-11T22:54:21.8920952Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8921096Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8921280Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8921432Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8921646Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8921751Z self.run() 2023-01-11T22:54:21.8921958Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8922104Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8922450Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8922584Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8922935Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8923058Z getattr(self, test_name)() 2023-01-11T22:54:21.8923423Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8923523Z fn() 2023-01-11T22:54:21.8923887Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8924069Z test(self, **param_kwargs) 2023-01-11T22:54:21.8924433Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8924561Z return func(*args, **kwargs) 2023-01-11T22:54:21.8924782Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8924900Z self.run_subtests( 2023-01-11T22:54:21.8925300Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8925467Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8925833Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8925988Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8926364Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8926487Z output = model(*input) 2023-01-11T22:54:21.8926797Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8926939Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8927316Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8927494Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8927864Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8927984Z _lazy_init(state, module) 2023-01-11T22:54:21.8928339Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8928508Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8928887Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8929029Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8929365Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8929489Z return func(*args, **kwargs) 2023-01-11T22:54:21.8929869Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8929973Z p_assert( 2023-01-11T22:54:21.8930306Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8930433Z traceback.print_stack() 2023-01-11T22:54:21.8930546Z File "", line 1, in 2023-01-11T22:54:21.8930758Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8930904Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8931108Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8931259Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8931469Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8931573Z self.run() 2023-01-11T22:54:21.8931761Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8931911Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8932254Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8932385Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8932747Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8933052Z getattr(self, test_name)() 2023-01-11T22:54:21.8933436Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8933631Z fn() 2023-01-11T22:54:21.8933992Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8934116Z test(self, **param_kwargs) 2023-01-11T22:54:21.8934476Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8934665Z return func(*args, **kwargs) 2023-01-11T22:54:21.8934914Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8935030Z self.run_subtests( 2023-01-11T22:54:21.8935389Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8935550Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8935896Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8936056Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8936432Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8936551Z output = model(*input) 2023-01-11T22:54:21.8936877Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8937018Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8937394Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8937570Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8937919Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8938043Z _lazy_init(state, module) 2023-01-11T22:54:21.8938403Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8938574Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8938974Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8939122Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8939464Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8939591Z return func(*args, **kwargs) 2023-01-11T22:54:21.8939969Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8940058Z p_assert( 2023-01-11T22:54:21.8940395Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8940522Z traceback.print_stack() 2023-01-11T22:54:21.8941278Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8942028Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8942773Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8943601Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8944390Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8945137Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8945878Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8946614Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8946855Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8947096Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8947226Z File "", line 1, in 2023-01-11T22:54:21.8947441Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8947575Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8947781Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8947937Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8948157Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8948261Z self.run() 2023-01-11T22:54:21.8948469Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8948616Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8948946Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8949081Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8949449Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8949579Z getattr(self, test_name)() 2023-01-11T22:54:21.8949941Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8950041Z fn() 2023-01-11T22:54:21.8950408Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8950533Z test(self, **param_kwargs) 2023-01-11T22:54:21.8950878Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8951002Z return func(*args, **kwargs) 2023-01-11T22:54:21.8951244Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8951359Z self.run_subtests( 2023-01-11T22:54:21.8951713Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8951939Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8952312Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8952468Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8952827Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8952950Z output = model(*input) 2023-01-11T22:54:21.8953322Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8953468Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8953856Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8954034Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8954405Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8954532Z _lazy_init(state, module) 2023-01-11T22:54:21.8954872Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8955042Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8955478Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8955626Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8955968Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8956096Z return func(*args, **kwargs) 2023-01-11T22:54:21.8956476Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8956580Z p_assert( 2023-01-11T22:54:21.8956907Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8957035Z traceback.print_stack() 2023-01-11T22:54:21.8957166Z File "", line 1, in 2023-01-11T22:54:21.8957377Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8957523Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8957730Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8957884Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8958095Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8958183Z self.run() 2023-01-11T22:54:21.8958387Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8958531Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8958874Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8959013Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8959378Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8959503Z getattr(self, test_name)() 2023-01-11T22:54:21.8959862Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8959945Z fn() 2023-01-11T22:54:21.8960317Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8960443Z test(self, **param_kwargs) 2023-01-11T22:54:21.8960800Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8960925Z return func(*args, **kwargs) 2023-01-11T22:54:21.8961166Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8961344Z self.run_subtests( 2023-01-11T22:54:21.8961690Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8961851Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8962215Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8962422Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8962807Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8962926Z output = model(*input) 2023-01-11T22:54:21.8963254Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8963396Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8963755Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8963936Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8964303Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8964426Z _lazy_init(state, module) 2023-01-11T22:54:21.8964783Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8964951Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8965347Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8965491Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8965828Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8965942Z return func(*args, **kwargs) 2023-01-11T22:54:21.8966319Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8966424Z p_assert( 2023-01-11T22:54:21.8966760Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8966883Z traceback.print_stack() 2023-01-11T22:54:21.8967125Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8967364Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8967495Z File "", line 1, in 2023-01-11T22:54:21.8967689Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8967833Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8968037Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8968191Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8968404Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8968510Z self.run() 2023-01-11T22:54:21.8968715Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8968844Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8969190Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8969325Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8969687Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8969812Z getattr(self, test_name)() 2023-01-11T22:54:21.8970171Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8970272Z fn() 2023-01-11T22:54:21.8970705Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8970813Z test(self, **param_kwargs) 2023-01-11T22:54:21.8971172Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8971300Z return func(*args, **kwargs) 2023-01-11T22:54:21.8971587Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8971707Z self.run_subtests( 2023-01-11T22:54:21.8972065Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8972231Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8972595Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8972733Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8973264Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8973388Z output = model(*input) 2023-01-11T22:54:21.8973715Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8973854Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8974241Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8974420Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8974790Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8974894Z _lazy_init(state, module) 2023-01-11T22:54:21.8975251Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8975424Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8975820Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8975966Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8976305Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8976435Z return func(*args, **kwargs) 2023-01-11T22:54:21.8976816Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8976902Z p_assert( 2023-01-11T22:54:21.8977244Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8977369Z traceback.print_stack() 2023-01-11T22:54:21.8977499Z File "", line 1, in 2023-01-11T22:54:21.8977714Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.8977863Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.8978067Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.8978220Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.8978416Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.8978521Z self.run() 2023-01-11T22:54:21.8978732Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.8978877Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.8979218Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.8979351Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.8979714Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.8979915Z getattr(self, test_name)() 2023-01-11T22:54:21.8980286Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.8980386Z fn() 2023-01-11T22:54:21.8980751Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.8980879Z test(self, **param_kwargs) 2023-01-11T22:54:21.8981296Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.8981431Z return func(*args, **kwargs) 2023-01-11T22:54:21.8981673Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.8981771Z self.run_subtests( 2023-01-11T22:54:21.8982134Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.8982299Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.8982671Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.8982826Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.8983200Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.8983318Z output = model(*input) 2023-01-11T22:54:21.8983646Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.8983768Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.8984149Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.8984327Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.8984693Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.8984817Z _lazy_init(state, module) 2023-01-11T22:54:21.8985170Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.8985342Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.8985743Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.8985887Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.8986211Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.8986335Z return func(*args, **kwargs) 2023-01-11T22:54:21.8986715Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.8986819Z p_assert( 2023-01-11T22:54:21.8987153Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.8987283Z traceback.print_stack() 2023-01-11T22:54:21.8987521Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8987741Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8988498Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8989243Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8990055Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8990842Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8991590Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8992331Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8993066Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8993801Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8994565Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8995298Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8996032Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8996764Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8997004Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8997241Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8997474Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8997704Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8997933Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8998161Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.8998967Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.8999749Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9000501Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9001240Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9001457Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9001701Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9001935Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9002167Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9002904Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9003644Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9004374Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9005111Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9005354Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9005589Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9005826Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9006056Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9006267Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9006488Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9007224Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9008088Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9008837Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9009572Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9009810Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9010042Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9010277Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9010508Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9010738Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9010961Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9011170Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9011403Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9012144Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9013070Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9013825Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9014560Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9014797Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9015032Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9015265Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9015494Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9015722Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9016035Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9016777Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9017568Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9018324Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9019057Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9019279Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9019509Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9019739Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9019967Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9020706Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9021448Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9022179Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9022909Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9023148Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9023379Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9023614Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9023845Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9024054Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9024281Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9025017Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9025860Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9026606Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9027341Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9027582Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9027816Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9028046Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9028273Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9028500Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9028726Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9028822Z dist init r=1, world=2 2023-01-11T22:54:21.9029159Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9029475Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9029787Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9030091Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9030391Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9030693Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9030997Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9031295Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9031596Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9031891Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9032191Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9032565Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9032679Z dist init r=0, world=2 2023-01-11T22:54:21.9033004Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9033361Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9033673Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9033980Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9034286Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9034590Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9034895Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9035196Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9035495Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9035778Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9036081Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9036385Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9036487Z ok (10.324s) 2023-01-11T22:54:21.9036819Z test_transformer_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89860 2023-01-11T22:54:21.9037038Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89861 2023-01-11T22:54:21.9037428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.9037613Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.9038001Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.9038177Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.9038550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests 2023-01-11T22:54:21.9038732Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2023-01-11T22:54:21.9039116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests 2023-01-11T22:54:21.9039306Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2023-01-11T22:54:21.9039552Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2023-01-11T22:54:21.9039795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2023-01-11T22:54:21.9040273Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.9040672Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2023-01-11T22:54:21.9040886Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2023-01-11T22:54:21.9041160Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2023-01-11T22:54:21.9041402Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9041634Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9042662Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.9042783Z warnings.warn( 2023-01-11T22:54:21.9043811Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:782: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2023-01-11T22:54:21.9043924Z warnings.warn( 2023-01-11T22:54:21.9044054Z File "", line 1, in 2023-01-11T22:54:21.9044184Z File "", line 1, in 2023-01-11T22:54:21.9044406Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9044533Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9044746Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9044889Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9045092Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9045247Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9045449Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9045600Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9045796Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9045900Z self.run() 2023-01-11T22:54:21.9046108Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9046213Z self.run() 2023-01-11T22:54:21.9046417Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9046564Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9046768Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9046914Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9047251Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9047385Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9047727Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9047858Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9048224Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9048349Z getattr(self, test_name)() 2023-01-11T22:54:21.9048788Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9048912Z getattr(self, test_name)() 2023-01-11T22:54:21.9049258Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9049358Z fn() 2023-01-11T22:54:21.9049768Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9049870Z fn() 2023-01-11T22:54:21.9050245Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9050368Z test(self, **param_kwargs) 2023-01-11T22:54:21.9050731Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9050836Z test(self, **param_kwargs) 2023-01-11T22:54:21.9051193Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9051325Z return func(*args, **kwargs) 2023-01-11T22:54:21.9051683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9051808Z return func(*args, **kwargs) 2023-01-11T22:54:21.9052051Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9052170Z self.run_subtests( 2023-01-11T22:54:21.9052414Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9052510Z self.run_subtests( 2023-01-11T22:54:21.9053082Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9053258Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9053618Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9053786Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9054148Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9054302Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9054664Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9054796Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9055172Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9055292Z output = model(*input) 2023-01-11T22:54:21.9055669Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9055798Z output = model(*input) 2023-01-11T22:54:21.9056123Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9056263Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9056590Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9056710Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9057092Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9057268Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9057649Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9057822Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9058190Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9058411Z _lazy_init(state, module) 2023-01-11T22:54:21.9058782Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9058906Z _lazy_init(state, module) 2023-01-11T22:54:21.9059243Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9059475Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9059848Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9060019Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9060418Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9060565Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9060971Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9061115Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9061438Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9061564Z return func(*args, **kwargs) 2023-01-11T22:54:21.9061909Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9062036Z return func(*args, **kwargs) 2023-01-11T22:54:21.9062416Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9062519Z p_assert( 2023-01-11T22:54:21.9062896Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9062994Z p_assert( 2023-01-11T22:54:21.9063322Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9063451Z traceback.print_stack() 2023-01-11T22:54:21.9063789Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9063916Z traceback.print_stack() 2023-01-11T22:54:21.9064152Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9064390Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9065144Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9065885Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9066631Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9067373Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9068110Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9068967Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9069713Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9070448Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9071188Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9071915Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9072045Z File "", line 1, in 2023-01-11T22:54:21.9072246Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9072390Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9072596Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9072751Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9072967Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9073076Z self.run() 2023-01-11T22:54:21.9073282Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9073413Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9073763Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9073896Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9074263Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9074391Z getattr(self, test_name)() 2023-01-11T22:54:21.9074754Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9074855Z fn() 2023-01-11T22:54:21.9075216Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9075321Z test(self, **param_kwargs) 2023-01-11T22:54:21.9075683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9075808Z return func(*args, **kwargs) 2023-01-11T22:54:21.9076049Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9076159Z self.run_subtests( 2023-01-11T22:54:21.9076515Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9076739Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9077110Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9077246Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9077620Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9077792Z output = model(*input) 2023-01-11T22:54:21.9078133Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9078272Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9078649Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9078824Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9079192Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9079301Z _lazy_init(state, module) 2023-01-11T22:54:21.9079658Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9079825Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9080226Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9080371Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9080711Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9080838Z return func(*args, **kwargs) 2023-01-11T22:54:21.9081216Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9081319Z p_assert( 2023-01-11T22:54:21.9081643Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9081770Z traceback.print_stack() 2023-01-11T22:54:21.9081901Z File "", line 1, in 2023-01-11T22:54:21.9082112Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9082256Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9082460Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9082613Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9082809Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9082914Z self.run() 2023-01-11T22:54:21.9083116Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9083263Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9083603Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9083739Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9084104Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9084231Z getattr(self, test_name)() 2023-01-11T22:54:21.9084574Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9084676Z fn() 2023-01-11T22:54:21.9085044Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9085170Z test(self, **param_kwargs) 2023-01-11T22:54:21.9085527Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9085656Z return func(*args, **kwargs) 2023-01-11T22:54:21.9085896Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9086068Z self.run_subtests( 2023-01-11T22:54:21.9086413Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9086576Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9086943Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9087150Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9087540Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9087665Z output = model(*input) 2023-01-11T22:54:21.9087991Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9088131Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9088494Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9088675Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9089041Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9089161Z _lazy_init(state, module) 2023-01-11T22:54:21.9089516Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9089685Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9090081Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9090222Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9090543Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9090674Z return func(*args, **kwargs) 2023-01-11T22:54:21.9091054Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9091158Z p_assert( 2023-01-11T22:54:21.9091495Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9091622Z traceback.print_stack() 2023-01-11T22:54:21.9091866Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9092103Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9092215Z File "", line 1, in 2023-01-11T22:54:21.9092426Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9092568Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9092772Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9093078Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9093302Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9093412Z self.run() 2023-01-11T22:54:21.9093598Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9093747Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9094103Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9094237Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9094595Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9094717Z getattr(self, test_name)() 2023-01-11T22:54:21.9095096Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9095280Z fn() 2023-01-11T22:54:21.9095637Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9095761Z test(self, **param_kwargs) 2023-01-11T22:54:21.9096119Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9096246Z return func(*args, **kwargs) 2023-01-11T22:54:21.9096547Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9096669Z self.run_subtests( 2023-01-11T22:54:21.9097030Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9097197Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9097543Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9097703Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9098082Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9098205Z output = model(*input) 2023-01-11T22:54:21.9098533Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9098673Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9099051Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9099229Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9099582Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9099702Z _lazy_init(state, module) 2023-01-11T22:54:21.9100053Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9100226Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9100622Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9100766Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9101104Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9101236Z return func(*args, **kwargs) 2023-01-11T22:54:21.9101595Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9101701Z p_assert( 2023-01-11T22:54:21.9102038Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9102165Z traceback.print_stack() 2023-01-11T22:54:21.9102297Z File "", line 1, in 2023-01-11T22:54:21.9102511Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9102658Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9102861Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9102995Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9103208Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9103315Z self.run() 2023-01-11T22:54:21.9103518Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9103663Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9104005Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9104141Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9104489Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9104675Z getattr(self, test_name)() 2023-01-11T22:54:21.9105046Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9105146Z fn() 2023-01-11T22:54:21.9105513Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9105642Z test(self, **param_kwargs) 2023-01-11T22:54:21.9106043Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9106171Z return func(*args, **kwargs) 2023-01-11T22:54:21.9106395Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9106511Z self.run_subtests( 2023-01-11T22:54:21.9106873Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9107044Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9107409Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9107564Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9107940Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9108065Z output = model(*input) 2023-01-11T22:54:21.9108375Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9108515Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9108893Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9109069Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9109437Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9109563Z _lazy_init(state, module) 2023-01-11T22:54:21.9109920Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9110089Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9110492Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9110620Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9110957Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9111083Z return func(*args, **kwargs) 2023-01-11T22:54:21.9111459Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9111563Z p_assert( 2023-01-11T22:54:21.9111904Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9112031Z traceback.print_stack() 2023-01-11T22:54:21.9112251Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9112487Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9112617Z File "", line 1, in 2023-01-11T22:54:21.9112830Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9112976Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9113180Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9113332Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9113547Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9113635Z self.run() 2023-01-11T22:54:21.9113900Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9114046Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9114394Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9114528Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9114940Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9115068Z getattr(self, test_name)() 2023-01-11T22:54:21.9115433Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9115514Z fn() 2023-01-11T22:54:21.9115874Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9115996Z test(self, **param_kwargs) 2023-01-11T22:54:21.9116353Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9116483Z return func(*args, **kwargs) 2023-01-11T22:54:21.9116724Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9116838Z self.run_subtests( 2023-01-11T22:54:21.9117194Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9117345Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9117710Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9117864Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9118239Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9118358Z output = model(*input) 2023-01-11T22:54:21.9118689Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9118828Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9119205Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9119363Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9119732Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9119854Z _lazy_init(state, module) 2023-01-11T22:54:21.9120206Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9120375Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9120771Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9120917Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9121253Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9121362Z return func(*args, **kwargs) 2023-01-11T22:54:21.9121740Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9121842Z p_assert( 2023-01-11T22:54:21.9122179Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9122304Z traceback.print_stack() 2023-01-11T22:54:21.9122433Z File "", line 1, in 2023-01-11T22:54:21.9122644Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9122786Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9122972Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9123187Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9123399Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9123503Z self.run() 2023-01-11T22:54:21.9123708Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9123855Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9124247Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9124371Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9124741Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9124865Z getattr(self, test_name)() 2023-01-11T22:54:21.9125226Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9125327Z fn() 2023-01-11T22:54:21.9125694Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9125819Z test(self, **param_kwargs) 2023-01-11T22:54:21.9126176Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9126284Z return func(*args, **kwargs) 2023-01-11T22:54:21.9126528Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9126643Z self.run_subtests( 2023-01-11T22:54:21.9126997Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9127159Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9127522Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9127673Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9128053Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9128156Z output = model(*input) 2023-01-11T22:54:21.9128480Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9128621Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9129004Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9129181Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9129547Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9129670Z _lazy_init(state, module) 2023-01-11T22:54:21.9130024Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9130179Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9130576Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9130718Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9131056Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9131182Z return func(*args, **kwargs) 2023-01-11T22:54:21.9131561Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9131664Z p_assert( 2023-01-11T22:54:21.9131999Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9132108Z traceback.print_stack() 2023-01-11T22:54:21.9133053Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9133980Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9134743Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9135486Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9136231Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9136962Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9137694Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9138440Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9138684Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9138918Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9139049Z File "", line 1, in 2023-01-11T22:54:21.9139256Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9139403Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9139606Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9139740Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9139953Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9140060Z self.run() 2023-01-11T22:54:21.9140264Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9140415Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9140759Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9140890Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9141258Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9141365Z getattr(self, test_name)() 2023-01-11T22:54:21.9141728Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9141885Z fn() 2023-01-11T22:54:21.9142260Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9142384Z test(self, **param_kwargs) 2023-01-11T22:54:21.9142742Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9142916Z return func(*args, **kwargs) 2023-01-11T22:54:21.9143162Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9143259Z self.run_subtests( 2023-01-11T22:54:21.9143621Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9143779Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9144140Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9144299Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9144674Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9144794Z output = model(*input) 2023-01-11T22:54:21.9145122Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9145244Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9145622Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9145797Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9146166Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9146291Z _lazy_init(state, module) 2023-01-11T22:54:21.9146641Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9146803Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9147199Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9147326Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9147666Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9147792Z return func(*args, **kwargs) 2023-01-11T22:54:21.9148171Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9148274Z p_assert( 2023-01-11T22:54:21.9148612Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9148739Z traceback.print_stack() 2023-01-11T22:54:21.9148868Z File "", line 1, in 2023-01-11T22:54:21.9149062Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9149205Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9149405Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9149555Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9149772Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9149875Z self.run() 2023-01-11T22:54:21.9150079Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9150211Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9150550Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9150679Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9151115Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9151239Z getattr(self, test_name)() 2023-01-11T22:54:21.9151599Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9151700Z fn() 2023-01-11T22:54:21.9152109Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9152222Z test(self, **param_kwargs) 2023-01-11T22:54:21.9152584Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9152707Z return func(*args, **kwargs) 2023-01-11T22:54:21.9152950Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9153061Z self.run_subtests( 2023-01-11T22:54:21.9153415Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9153576Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9153937Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9154073Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9154448Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9154563Z output = model(*input) 2023-01-11T22:54:21.9154886Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9155023Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9155401Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9155574Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9155941Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9156044Z _lazy_init(state, module) 2023-01-11T22:54:21.9156398Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9156564Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9156961Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9157103Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9157440Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9157565Z return func(*args, **kwargs) 2023-01-11T22:54:21.9157942Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9158048Z p_assert( 2023-01-11T22:54:21.9158370Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9158497Z traceback.print_stack() 2023-01-11T22:54:21.9158736Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9158971Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9159098Z File "", line 1, in 2023-01-11T22:54:21.9159309Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9159451Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9159636Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9159788Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9160065Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9160170Z self.run() 2023-01-11T22:54:21.9160377Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9160522Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9160871Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9161045Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9161403Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9161526Z getattr(self, test_name)() 2023-01-11T22:54:21.9161887Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9161987Z fn() 2023-01-11T22:54:21.9162354Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9162480Z test(self, **param_kwargs) 2023-01-11T22:54:21.9162828Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9162948Z return func(*args, **kwargs) 2023-01-11T22:54:21.9163170Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9163281Z self.run_subtests( 2023-01-11T22:54:21.9163632Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9163790Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9164152Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9164305Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9164679Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9164804Z output = model(*input) 2023-01-11T22:54:21.9165114Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9165253Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9165630Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9165801Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9166166Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9166286Z _lazy_init(state, module) 2023-01-11T22:54:21.9166637Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9166806Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9167191Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9167331Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9167662Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9167785Z return func(*args, **kwargs) 2023-01-11T22:54:21.9168160Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9168261Z p_assert( 2023-01-11T22:54:21.9168593Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9168721Z traceback.print_stack() 2023-01-11T22:54:21.9168834Z File "", line 1, in 2023-01-11T22:54:21.9169040Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9169247Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9169454Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9169608Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9169819Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9169922Z self.run() 2023-01-11T22:54:21.9170112Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9170336Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9170695Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9170829Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9171191Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9171310Z getattr(self, test_name)() 2023-01-11T22:54:21.9171666Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9171769Z fn() 2023-01-11T22:54:21.9172118Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9172241Z test(self, **param_kwargs) 2023-01-11T22:54:21.9172593Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9172714Z return func(*args, **kwargs) 2023-01-11T22:54:21.9173086Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9173200Z self.run_subtests( 2023-01-11T22:54:21.9173557Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9173720Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9174067Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9174223Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9174602Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9174719Z output = model(*input) 2023-01-11T22:54:21.9175048Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9175186Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9175564Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9175737Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9176088Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9176210Z _lazy_init(state, module) 2023-01-11T22:54:21.9176562Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9176724Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9177116Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9177261Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9177599Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9177723Z return func(*args, **kwargs) 2023-01-11T22:54:21.9178083Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9178187Z p_assert( 2023-01-11T22:54:21.9178522Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9178732Z traceback.print_stack() 2023-01-11T22:54:21.9178970Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9179206Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9179334Z File "", line 1, in 2023-01-11T22:54:21.9179542Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9179723Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9179927Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9180075Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9180286Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9180393Z self.run() 2023-01-11T22:54:21.9180598Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9180750Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9181097Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9181214Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9181575Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9181697Z getattr(self, test_name)() 2023-01-11T22:54:21.9182059Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9182151Z fn() 2023-01-11T22:54:21.9182511Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9182628Z test(self, **param_kwargs) 2023-01-11T22:54:21.9182967Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9183092Z return func(*args, **kwargs) 2023-01-11T22:54:21.9183331Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9183440Z self.run_subtests( 2023-01-11T22:54:21.9183792Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9183956Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9184319Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9184470Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9184838Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9184942Z output = model(*input) 2023-01-11T22:54:21.9185264Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9185402Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9185776Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9185946Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9186309Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9186434Z _lazy_init(state, module) 2023-01-11T22:54:21.9186785Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9186939Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9187339Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9187484Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9187892Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9188014Z return func(*args, **kwargs) 2023-01-11T22:54:21.9188391Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9188491Z p_assert( 2023-01-11T22:54:21.9188823Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9188981Z traceback.print_stack() 2023-01-11T22:54:21.9189116Z File "", line 1, in 2023-01-11T22:54:21.9189326Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9189468Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9189666Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9189813Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9190025Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9190116Z self.run() 2023-01-11T22:54:21.9190316Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9190459Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9190802Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9190936Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9191296Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9191419Z getattr(self, test_name)() 2023-01-11T22:54:21.9191775Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9191856Z fn() 2023-01-11T22:54:21.9192219Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9192344Z test(self, **param_kwargs) 2023-01-11T22:54:21.9192698Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9192823Z return func(*args, **kwargs) 2023-01-11T22:54:21.9193063Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9193175Z self.run_subtests( 2023-01-11T22:54:21.9193527Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9193673Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9194036Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9194187Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9194559Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9194677Z output = model(*input) 2023-01-11T22:54:21.9194999Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9195137Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9195535Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9195700Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9196067Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9196189Z _lazy_init(state, module) 2023-01-11T22:54:21.9196537Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9196705Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9197183Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9197327Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9197658Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9197768Z return func(*args, **kwargs) 2023-01-11T22:54:21.9198197Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9198304Z p_assert( 2023-01-11T22:54:21.9198641Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9198764Z traceback.print_stack() 2023-01-11T22:54:21.9199515Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9200262Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9201006Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9201746Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9202487Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9203217Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9203947Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9204680Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9205407Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9206131Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9206930Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9207706Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9208447Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9209173Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9209902Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9210624Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9210872Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9211110Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9211225Z File "", line 1, in 2023-01-11T22:54:21.9211442Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9211587Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9211790Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9211940Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9212154Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9212257Z self.run() 2023-01-11T22:54:21.9212445Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9212593Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9213125Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9213267Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9213635Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9213763Z getattr(self, test_name)() 2023-01-11T22:54:21.9214128Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9214227Z fn() 2023-01-11T22:54:21.9214578Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9214702Z test(self, **param_kwargs) 2023-01-11T22:54:21.9215064Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9215274Z return func(*args, **kwargs) 2023-01-11T22:54:21.9215516Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9215630Z self.run_subtests( 2023-01-11T22:54:21.9215989Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9216154Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9216565Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9216732Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9217114Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9217234Z output = model(*input) 2023-01-11T22:54:21.9217561Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9217704Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9218083Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9218258Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9218613Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9218734Z _lazy_init(state, module) 2023-01-11T22:54:21.9219083Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9219245Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9219635Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9219776Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9220117Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9220242Z return func(*args, **kwargs) 2023-01-11T22:54:21.9220601Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9220706Z p_assert( 2023-01-11T22:54:21.9221044Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9221174Z traceback.print_stack() 2023-01-11T22:54:21.9221305Z File "", line 1, in 2023-01-11T22:54:21.9221515Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9221657Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9221861Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9221996Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9222213Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9222319Z self.run() 2023-01-11T22:54:21.9222521Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9222664Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9223002Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9223138Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9223484Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9223609Z getattr(self, test_name)() 2023-01-11T22:54:21.9223970Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9224061Z fn() 2023-01-11T22:54:21.9224423Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9224608Z test(self, **param_kwargs) 2023-01-11T22:54:21.9224967Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9225092Z return func(*args, **kwargs) 2023-01-11T22:54:21.9225317Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9225480Z self.run_subtests( 2023-01-11T22:54:21.9225846Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9226008Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9226374Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9226528Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9226904Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9227025Z output = model(*input) 2023-01-11T22:54:21.9227330Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9227473Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9227850Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9228024Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9228386Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9228507Z _lazy_init(state, module) 2023-01-11T22:54:21.9228858Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9229027Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9229430Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9229557Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9229895Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9230024Z return func(*args, **kwargs) 2023-01-11T22:54:21.9230404Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9230506Z p_assert( 2023-01-11T22:54:21.9230841Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9230969Z traceback.print_stack() 2023-01-11T22:54:21.9231190Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9231428Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9231564Z File "", line 1, in 2023-01-11T22:54:21.9231775Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9231918Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9232115Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9232268Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9232479Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9232568Z self.run() 2023-01-11T22:54:21.9232771Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9232917Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9233255Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9233451Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9233813Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9233931Z getattr(self, test_name)() 2023-01-11T22:54:21.9234288Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9234371Z fn() 2023-01-11T22:54:21.9234789Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9234921Z test(self, **param_kwargs) 2023-01-11T22:54:21.9235292Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9235416Z return func(*args, **kwargs) 2023-01-11T22:54:21.9235654Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9235766Z self.run_subtests( 2023-01-11T22:54:21.9236121Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9236269Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9236630Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9236784Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9237155Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9237271Z output = model(*input) 2023-01-11T22:54:21.9237592Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9237730Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9238106Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9238267Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9238633Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9238756Z _lazy_init(state, module) 2023-01-11T22:54:21.9239110Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9239280Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9239673Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9239820Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9240154Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9240264Z return func(*args, **kwargs) 2023-01-11T22:54:21.9240639Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9240742Z p_assert( 2023-01-11T22:54:21.9241081Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9241206Z traceback.print_stack() 2023-01-11T22:54:21.9241336Z File "", line 1, in 2023-01-11T22:54:21.9241551Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9241691Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9241877Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9242028Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9242241Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9242344Z self.run() 2023-01-11T22:54:21.9242544Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9242758Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9243107Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9243225Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9243586Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9243711Z getattr(self, test_name)() 2023-01-11T22:54:21.9244119Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9244222Z fn() 2023-01-11T22:54:21.9244593Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9244720Z test(self, **param_kwargs) 2023-01-11T22:54:21.9245070Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9245183Z return func(*args, **kwargs) 2023-01-11T22:54:21.9245424Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9245536Z self.run_subtests( 2023-01-11T22:54:21.9245890Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9246052Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9246420Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9246575Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9246948Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9247052Z output = model(*input) 2023-01-11T22:54:21.9247378Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9247518Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9247896Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9248069Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9248440Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9248558Z _lazy_init(state, module) 2023-01-11T22:54:21.9248910Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9249063Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9249459Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9249603Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9249944Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9250068Z return func(*args, **kwargs) 2023-01-11T22:54:21.9250448Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9250549Z p_assert( 2023-01-11T22:54:21.9250889Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9250999Z traceback.print_stack() 2023-01-11T22:54:21.9251235Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9251469Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9251600Z File "", line 1, in 2023-01-11T22:54:21.9251810Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9252016Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9252221Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9252373Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9252569Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9252678Z self.run() 2023-01-11T22:54:21.9253162Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9253330Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9253684Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9253823Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9254187Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9254312Z getattr(self, test_name)() 2023-01-11T22:54:21.9254660Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9254759Z fn() 2023-01-11T22:54:21.9255123Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9255248Z test(self, **param_kwargs) 2023-01-11T22:54:21.9255609Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9255737Z return func(*args, **kwargs) 2023-01-11T22:54:21.9256008Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9256106Z self.run_subtests( 2023-01-11T22:54:21.9256458Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9256621Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9256991Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9257144Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9257517Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9257633Z output = model(*input) 2023-01-11T22:54:21.9257962Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9258100Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9258464Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9258642Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9259008Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9259130Z _lazy_init(state, module) 2023-01-11T22:54:21.9259485Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9259654Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9260051Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9260195Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9260513Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9260639Z return func(*args, **kwargs) 2023-01-11T22:54:21.9261017Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9261121Z p_assert( 2023-01-11T22:54:21.9261458Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9261681Z traceback.print_stack() 2023-01-11T22:54:21.9261811Z File "", line 1, in 2023-01-11T22:54:21.9262006Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9262148Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9262346Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9262543Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9262760Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9262866Z self.run() 2023-01-11T22:54:21.9263122Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9263272Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9263603Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9263740Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9264099Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9264222Z getattr(self, test_name)() 2023-01-11T22:54:21.9264580Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9264676Z fn() 2023-01-11T22:54:21.9265041Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9265167Z test(self, **param_kwargs) 2023-01-11T22:54:21.9265506Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9265633Z return func(*args, **kwargs) 2023-01-11T22:54:21.9265874Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9265984Z self.run_subtests( 2023-01-11T22:54:21.9266341Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9266502Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9266863Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9267015Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9267376Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9267495Z output = model(*input) 2023-01-11T22:54:21.9267820Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9267955Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9268334Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9268514Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9268878Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9269000Z _lazy_init(state, module) 2023-01-11T22:54:21.9269336Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9269507Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9269907Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9270053Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9270390Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9270515Z return func(*args, **kwargs) 2023-01-11T22:54:21.9270890Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9271055Z p_assert( 2023-01-11T22:54:21.9271381Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9271507Z traceback.print_stack() 2023-01-11T22:54:21.9272309Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9273068Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9273816Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9274554Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9275287Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9276022Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9276762Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9277498Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9277737Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9277969Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9278099Z File "", line 1, in 2023-01-11T22:54:21.9278318Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9278463Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9278656Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9278807Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9279018Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9279126Z self.run() 2023-01-11T22:54:21.9279331Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9279477Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9279887Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9280022Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9280370Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9280492Z getattr(self, test_name)() 2023-01-11T22:54:21.9280899Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9281005Z fn() 2023-01-11T22:54:21.9281380Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9281507Z test(self, **param_kwargs) 2023-01-11T22:54:21.9281866Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9281989Z return func(*args, **kwargs) 2023-01-11T22:54:21.9282211Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9282325Z self.run_subtests( 2023-01-11T22:54:21.9282679Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9282841Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9283209Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9283363Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9283738Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9283861Z output = model(*input) 2023-01-11T22:54:21.9284170Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9284311Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9284697Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9284873Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9285242Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9285363Z _lazy_init(state, module) 2023-01-11T22:54:21.9285719Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9285886Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9286267Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9286410Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9286752Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9286880Z return func(*args, **kwargs) 2023-01-11T22:54:21.9287260Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9287364Z p_assert( 2023-01-11T22:54:21.9287704Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9287835Z traceback.print_stack() 2023-01-11T22:54:21.9287950Z File "", line 1, in 2023-01-11T22:54:21.9288163Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9288307Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9288511Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9288659Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9288866Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9289031Z self.run() 2023-01-11T22:54:21.9289219Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9289368Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9289717Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9289853Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9290264Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9290391Z getattr(self, test_name)() 2023-01-11T22:54:21.9290756Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9290854Z fn() 2023-01-11T22:54:21.9291202Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9291325Z test(self, **param_kwargs) 2023-01-11T22:54:21.9291688Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9291809Z return func(*args, **kwargs) 2023-01-11T22:54:21.9292045Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9292158Z self.run_subtests( 2023-01-11T22:54:21.9292519Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9292681Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9293175Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9293338Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9293719Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9293844Z output = model(*input) 2023-01-11T22:54:21.9294171Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9294309Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9294681Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9294854Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9295206Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9295322Z _lazy_init(state, module) 2023-01-11T22:54:21.9295674Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9295840Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9296266Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9296414Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9296749Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9296873Z return func(*args, **kwargs) 2023-01-11T22:54:21.9297234Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9297335Z p_assert( 2023-01-11T22:54:21.9297668Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9297796Z traceback.print_stack() 2023-01-11T22:54:21.9298031Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9298267Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9298391Z File "", line 1, in 2023-01-11T22:54:21.9298685Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9298813Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9299019Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9299171Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9299385Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9299549Z self.run() 2023-01-11T22:54:21.9299755Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9299903Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9300251Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9300368Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9300733Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9300863Z getattr(self, test_name)() 2023-01-11T22:54:21.9301220Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9301320Z fn() 2023-01-11T22:54:21.9301683Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9301811Z test(self, **param_kwargs) 2023-01-11T22:54:21.9302152Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9302277Z return func(*args, **kwargs) 2023-01-11T22:54:21.9302514Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9302628Z self.run_subtests( 2023-01-11T22:54:21.9302985Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9303154Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9303517Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9303669Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9304029Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9304152Z output = model(*input) 2023-01-11T22:54:21.9304482Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9304618Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9304992Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9305168Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9305543Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9305664Z _lazy_init(state, module) 2023-01-11T22:54:21.9306016Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9306167Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9306563Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9306705Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9307040Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9307165Z return func(*args, **kwargs) 2023-01-11T22:54:21.9307538Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9307705Z p_assert( 2023-01-11T22:54:21.9308049Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9308160Z traceback.print_stack() 2023-01-11T22:54:21.9308289Z File "", line 1, in 2023-01-11T22:54:21.9308497Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2023-01-11T22:54:21.9308642Z exitcode = _main(fd, parent_sentinel) 2023-01-11T22:54:21.9308890Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2023-01-11T22:54:21.9309051Z return self._bootstrap(parent_sentinel) 2023-01-11T22:54:21.9309267Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap 2023-01-11T22:54:21.9309354Z self.run() 2023-01-11T22:54:21.9309559Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2023-01-11T22:54:21.9309704Z self._target(*self._args, **self._kwargs) 2023-01-11T22:54:21.9310053Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 824, in _run 2023-01-11T22:54:21.9310188Z self.run_test(test_name, pipe) 2023-01-11T22:54:21.9310553Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-01-11T22:54:21.9310680Z getattr(self, test_name)() 2023-01-11T22:54:21.9311045Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 536, in wrapper 2023-01-11T22:54:21.9311127Z fn() 2023-01-11T22:54:21.9311495Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 248, in instantiated_test 2023-01-11T22:54:21.9311618Z test(self, **param_kwargs) 2023-01-11T22:54:21.9311974Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 167, in wrapper 2023-01-11T22:54:21.9312098Z return func(*args, **kwargs) 2023-01-11T22:54:21.9312340Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_core.py", line 162, in test_transformer 2023-01-11T22:54:21.9312455Z self.run_subtests( 2023-01-11T22:54:21.9312809Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 788, in run_subtests 2023-01-11T22:54:21.9312956Z test_fn(*test_args, **test_kwargs, **subtest_kwargs) 2023-01-11T22:54:21.9313324Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 1024, in _test_fsdp_parity 2023-01-11T22:54:21.9313478Z fsdp_loss = self._train_for_several_steps( 2023-01-11T22:54:21.9313853Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 860, in _train_for_several_steps 2023-01-11T22:54:21.9313973Z output = model(*input) 2023-01-11T22:54:21.9314299Z File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl 2023-01-11T22:54:21.9314435Z return forward_call(*args, **kwargs) 2023-01-11T22:54:21.9314815Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 663, in forward 2023-01-11T22:54:21.9314973Z args, kwargs = _root_pre_forward(self, self, args, kwargs) 2023-01-11T22:54:21.9315338Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 483, in _root_pre_forward 2023-01-11T22:54:21.9315458Z _lazy_init(state, module) 2023-01-11T22:54:21.9315817Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 163, in _lazy_init 2023-01-11T22:54:21.9315983Z _share_state_and_init_handle_attrs(state, root_module) 2023-01-11T22:54:21.9316377Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_runtime_utils.py", line 178, in _share_state_and_init_handle_attrs 2023-01-11T22:54:21.9316520Z handle.init_flat_param_attributes() 2023-01-11T22:54:21.9316856Z File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 34, in decorate_context 2023-01-11T22:54:21.9317058Z return func(*args, **kwargs) 2023-01-11T22:54:21.9317446Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 775, in init_flat_param_attributes 2023-01-11T22:54:21.9317552Z p_assert( 2023-01-11T22:54:21.9317889Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/_utils.py", line 116, in p_assert 2023-01-11T22:54:21.9318017Z traceback.print_stack() 2023-01-11T22:54:21.9318301Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9318544Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9319303Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9320047Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9320790Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9321529Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9322267Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9322999Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9323733Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9324470Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9325199Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9325928Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9326742Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9327514Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9327742Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9327978Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9328212Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9328449Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9328677Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9328896Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9329645Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9330381Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9331120Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9331854Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9332087Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9332318Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9332531Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9332768Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9333697Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9334436Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9335172Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9335994Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9336284Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9336522Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9336754Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9336989Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9337218Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9337449Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9338196Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9338908Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9339652Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9340388Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9340626Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9340861Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9341093Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9341323Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9341551Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9341782Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9342008Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9342235Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9342978Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9343713Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9344497Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9345273Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9345513Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9345747Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9345979Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9346215Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9346441Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9346668Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9347417Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9348148Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9348888Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9349627Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9349861Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9350092Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9350308Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9350535Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9351277Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9352010Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9352741Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9353581Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9353816Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9354047Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9354280Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9354511Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9354746Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9354969Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9355696Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9356427Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9357164Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9357901Z [W python_variable.cpp:319] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2023-01-11T22:54:21.9358136Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9358365Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9358590Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9358820Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9359056Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9359281Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2023-01-11T22:54:21.9359392Z dist init r=1, world=2 2023-01-11T22:54:21.9359724Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9360029Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9360337Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9360641Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9361003Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9361305Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9361645Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9361951Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9362253Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9362557Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9362857Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9363159Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:1 after the FSDP constructor. 2023-01-11T22:54:21.9363273Z dist init r=0, world=2 2023-01-11T22:54:21.9363581Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9363893Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9364197Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9364504Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9364811Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9365111Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9365409Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9365708Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9366011Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9366309Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9366612Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9366895Z Expects the `FlatParameter` to be offloaded to CPU since CPU offloading is enabled. You may be accidentally moving the model to cuda:0 after the FSDP constructor. 2023-01-11T22:54:21.9366999Z ok (10.324s) 2023-01-11T22:54:21.9367023Z 2023-01-11T22:54:21.9367310Z ---------------------------------------------------------------------- 2023-01-11T22:54:21.9367488Z Ran 59 tests in 569.202s 2023-01-11T22:54:21.9367508Z 2023-01-11T22:54:21.9367615Z OK (skipped=5) 2023-01-11T22:54:21.9367634Z 2023-01-11T22:54:21.9367757Z Generating XML reports... 2023-01-11T22:54:21.9368174Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestHooks-20230111224451.xml 2023-01-11T22:54:21.9368623Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestNoGrad-20230111224451.xml 2023-01-11T22:54:21.9369049Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParamInit-20230111224451.xml 2023-01-11T22:54:21.9369462Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParityWithDDP-20230111224451.xml 2023-01-11T22:54:21.9369499Z 2023-01-11T22:54:21.9369970Z ##[endgroup] 2023-01-11T22:54:21.9370426Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_core (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_core_25gv4ax7) 2023-01-11T22:54:21.9370452Z 2023-01-11T22:54:21.9370508Z 2023-01-11T22:54:21.9370620Z real 91m18.644s 2023-01-11T22:54:21.9370727Z user 145m14.909s 2023-01-11T22:54:21.9370829Z sys 80m7.350s 2023-01-11T22:54:21.9370943Z + assert_git_not_dirty 2023-01-11T22:54:21.9371190Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 != *rocm* ]] 2023-01-11T22:54:21.9371401Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 != *xla* ]] 2023-01-11T22:54:21.9371562Z ++ git status --porcelain 2023-01-11T22:54:22.6604933Z + git_status= 2023-01-11T22:54:22.6605339Z + [[ -n '' ]] 2023-01-11T22:54:22.6605736Z + [[ linux-bionic-cuda11.7-py3.10-gcc7 == *cuda* ]] 2023-01-11T22:54:22.6606037Z + [[ 3 == 1 ]] 2023-01-11T22:54:22.6606245Z + [[ 3 == 1 ]] 2023-01-11T22:54:22.6674139Z Prepare all required actions 2023-01-11T22:54:22.6674567Z Getting action download info 2023-01-11T22:54:22.8511460Z ##[group]Run ./.github/actions/get-workflow-job-id 2023-01-11T22:54:22.8511765Z with: 2023-01-11T22:54:22.8512250Z github-token: *** 2023-01-11T22:54:22.8512508Z env: 2023-01-11T22:54:22.8512735Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:54:22.8513006Z GPU_FLAG: --gpus all 2023-01-11T22:54:22.8513386Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:54:22.8513731Z ##[endgroup] 2023-01-11T22:54:22.8546030Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 2023-01-11T22:54:22.8546350Z with: 2023-01-11T22:54:22.8546576Z shell: bash 2023-01-11T22:54:22.8546806Z timeout_minutes: 10 2023-01-11T22:54:22.8547058Z max_attempts: 5 2023-01-11T22:54:22.8547309Z retry_wait_seconds: 30 2023-01-11T22:54:22.8547851Z command: set -eux python3 -m pip install requests==2.26.0 GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}") echo "job-id=${GHA_WORKFLOW_JOB_ID}" >> "${GITHUB_OUTPUT}" 2023-01-11T22:54:22.8548376Z polling_interval_seconds: 1 2023-01-11T22:54:22.8548629Z warning_on_retry: true 2023-01-11T22:54:22.8548900Z continue_on_error: false 2023-01-11T22:54:22.8549146Z env: 2023-01-11T22:54:22.8549364Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:54:22.8549630Z GPU_FLAG: --gpus all 2023-01-11T22:54:22.8550003Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:54:22.8550518Z GITHUB_TOKEN: *** 2023-01-11T22:54:22.8550748Z ##[endgroup] 2023-01-11T22:54:22.9232610Z + python3 -m pip install requests==2.26.0 2023-01-11T22:54:23.2213723Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T22:54:23.2446863Z Requirement already satisfied: requests==2.26.0 in /home/ec2-user/.local/lib/python3.7/site-packages (2.26.0) 2023-01-11T22:54:23.2637519Z Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (2022.12.7) 2023-01-11T22:54:23.2650551Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (1.26.14) 2023-01-11T22:54:23.2878465Z Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (3.4) 2023-01-11T22:54:23.2895388Z Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (2.0.12) 2023-01-11T22:54:23.5466837Z ++ python3 .github/scripts/get_workflow_job_id.py 3896346758 i-0a2cfe12f2970a977 2023-01-11T22:54:26.7550665Z + GHA_WORKFLOW_JOB_ID=10589560684 2023-01-11T22:54:26.7553136Z + echo job-id=10589560684 2023-01-11T22:54:26.9235139Z Command completed after 1 attempt(s). 2023-01-11T22:54:26.9370273Z ##[group]Run kill "$MONITOR_SCRIPT_PID" 2023-01-11T22:54:26.9370635Z kill "$MONITOR_SCRIPT_PID" 2023-01-11T22:54:26.9384430Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:54:26.9384741Z env: 2023-01-11T22:54:26.9384989Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:54:26.9385256Z GPU_FLAG: --gpus all 2023-01-11T22:54:26.9385637Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:54:26.9386019Z MONITOR_SCRIPT_PID: 49027 2023-01-11T22:54:26.9386262Z ##[endgroup] 2023-01-11T22:54:26.9485840Z Prepare all required actions 2023-01-11T22:54:26.9486219Z Getting action download info 2023-01-11T22:54:27.1479712Z Download action repository 'actions/upload-artifact@v3' (SHA:0b7f8abb1508181956e8e162db84b466c27e18ce) 2023-01-11T22:54:27.3113788Z ##[group]Run ./.github/actions/upload-test-artifacts 2023-01-11T22:54:27.3114096Z with: 2023-01-11T22:54:27.3114435Z file-suffix: test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589560684 2023-01-11T22:54:27.3114781Z env: 2023-01-11T22:54:27.3115022Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:54:27.3115276Z GPU_FLAG: --gpus all 2023-01-11T22:54:27.3115645Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:54:27.3116001Z ##[endgroup] 2023-01-11T22:54:27.3147251Z ##[group]Run # Remove any previous test jsons if they exist 2023-01-11T22:54:27.3147613Z # Remove any previous test jsons if they exist 2023-01-11T22:54:27.3147931Z rm -f test-jsons-*.zip 2023-01-11T22:54:27.3148428Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' 2023-01-11T22:54:27.3161238Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:54:27.3161539Z env: 2023-01-11T22:54:27.3161787Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:54:27.3162044Z GPU_FLAG: --gpus all 2023-01-11T22:54:27.3162421Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:54:27.3162905Z FILE_SUFFIX: test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589560684 2023-01-11T22:54:27.3163248Z ##[endgroup] 2023-01-11T22:54:27.3285026Z adding: test/allowlist_for_publicAPI.json (deflated 78%) 2023-01-11T22:54:27.3319962Z adding: test/benchmark_utils/callgrind_artifacts.json (deflated 92%) 2023-01-11T22:54:27.3327247Z adding: test/profiler/profiler_utils_mock_events.json (deflated 87%) 2023-01-11T22:54:27.3329133Z adding: test/.pytorch-slow-tests.json (deflated 77%) 2023-01-11T22:54:27.3335479Z adding: test/.pytorch-disabled-tests.json (deflated 84%) 2023-01-11T22:54:27.3377619Z ##[group]Run # Remove any previous test reports if they exist 2023-01-11T22:54:27.3377996Z # Remove any previous test reports if they exist 2023-01-11T22:54:27.3378323Z rm -f test-reports-*.zip 2023-01-11T22:54:27.3378684Z zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' -i '*.csv' 2023-01-11T22:54:27.3390738Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:54:27.3391039Z env: 2023-01-11T22:54:27.3391283Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:54:27.3391538Z GPU_FLAG: --gpus all 2023-01-11T22:54:27.3391916Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:54:27.3392399Z FILE_SUFFIX: test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589560684 2023-01-11T22:54:27.3392924Z ##[endgroup] 2023-01-11T22:54:27.3545876Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_ignored_modules/TEST-TestFSDPIgnoredModules-20230111212309.xml (deflated 76%) 2023-01-11T22:54:27.3546716Z adding: test/test-reports/python-unittest/distributed._composable.test_checkpoint/TEST-TestCheckpoint-20230111221334.xml (deflated 75%) 2023-01-11T22:54:27.3547438Z adding: test/test-reports/python-unittest/distributed.test_nccl/TEST-TestNCCLCUDA-20230111221338.xml (deflated 83%) 2023-01-11T22:54:27.3548186Z adding: test/test-reports/python-unittest/distributed.checkpoint.test_traverse/TEST-TestTraverse-20230111221342.xml (deflated 77%) 2023-01-11T22:54:27.3548933Z adding: test/test-reports/python-unittest/distributed.nn.jit.test_instantiator/TEST-TestInstantiator-20230111221345.xml (deflated 63%) 2023-01-11T22:54:27.3549711Z adding: test/test-reports/python-unittest/distributed.checkpoint.test_utils/TEST-TestMedatadaIndex-20230111221349.xml (deflated 71%) 2023-01-11T22:54:27.3550527Z adding: test/test-reports/python-unittest/distributed._tensor.test_pointwise_ops/TEST-DistElementwiseOpsTest-20230111221353.xml (deflated 71%) 2023-01-11T22:54:27.3551504Z adding: test/test-reports/python-unittest/distributed.test_multi_threaded_pg/TEST-TestCollectivesWithBaseClass-20230111221357.xml (deflated 77%) 2023-01-11T22:54:27.3552367Z adding: test/test-reports/python-unittest/distributed.test_multi_threaded_pg/TEST-TestCollectivesWithWrapper-20230111221357.xml (deflated 74%) 2023-01-11T22:54:27.3553219Z adding: test/test-reports/python-unittest/distributed.checkpoint.test_fsdp_optim_state/TEST-FsdpOptimStateCheckpoint-20230111221401.xml (deflated 42%) 2023-01-11T22:54:27.3554040Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_traversal/TEST-TestTraversal-20230111221408.xml (deflated 41%) 2023-01-11T22:54:27.3554810Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_uneven/TEST-TestUnevenParamShard-20230111221415.xml (deflated 41%) 2023-01-11T22:54:27.3555684Z adding: test/test-reports/python-unittest/distributed.checkpoint.test_fsdp_model_state/TEST-FsdpModelStateCheckpoint-20230111221422.xml (deflated 60%) 2023-01-11T22:54:27.3556532Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_embedding/TEST-TestShardedEmbedding-20230111221431.xml (deflated 60%) 2023-01-11T22:54:27.3557385Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_chunk/TEST-TestShardedTensorChunkOps-20230111221439.xml (deflated 60%) 2023-01-11T22:54:27.3558197Z adding: test/test-reports/python-unittest/distributed.test_c10d_error_logger/TEST-C10dErrorLoggerTest-20230111221448.xml (deflated 53%) 2023-01-11T22:54:27.3559015Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_init/TEST-TestShardedTensorNNInit-20230111221457.xml (deflated 69%) 2023-01-11T22:54:27.3559781Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_pure_fp16/TEST-TestPureFP16-20230111221508.xml (deflated 60%) 2023-01-11T22:54:27.3560613Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_binary_cmp/TEST-TestShardedTensorBinaryOps-20230111221520.xml (deflated 73%) 2023-01-11T22:54:27.3561498Z adding: test/test-reports/python-unittest/distributed.tensor.parallel.test_2d_parallel/TEST-Test2dParallelIntegration-20230111221534.xml (deflated 77%) 2023-01-11T22:54:27.3562332Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_tensor_ops/TEST-TestTensorOps-20230111221549.xml (deflated 75%) 2023-01-11T22:54:27.3563075Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_memory/TEST-TestFSDPMemory-20230111221605.xml (deflated 55%) 2023-01-11T22:54:27.3563854Z adding: test/test-reports/python-unittest/distributed.test_c10d_object_collectives/TEST-TestObjectCollectives-20230111221621.xml (deflated 68%) 2023-01-11T22:54:27.3564764Z adding: test/test-reports/python-unittest/distributed._tensor.test_tp_sharding_ops/TEST-TPShardingOpsTest-20230111221638.xml (deflated 78%) 2023-01-11T22:54:27.3565600Z adding: test/test-reports/python-unittest/distributed.tensor.parallel.test_tp_style/TEST-TensorParallelStyleTest-20230111221656.xml (deflated 82%) 2023-01-11T22:54:27.3566403Z adding: test/test-reports/python-unittest/distributed._tensor.test_redistribute/TEST-RedistributeTest-20230111221719.xml (deflated 76%) 2023-01-11T22:54:27.3567169Z adding: test/test-reports/python-unittest/distributed._tensor.test_redistribute/TEST-MultiDimRedistributeTest-20230111221719.xml (deflated 42%) 2023-01-11T22:54:27.3568024Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_matrix_ops/TEST-TestShardedTensorMatrixOps-20230111221745.xml (deflated 86%) 2023-01-11T22:54:27.3568850Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_flatten_params/TEST-TestFlattenParams-20230111221815.xml (deflated 77%) 2023-01-11T22:54:27.3569638Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_exec_order/TEST-TestFSDPExecOrder-20230111221848.xml (deflated 83%) 2023-01-11T22:54:27.3570492Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_sharded_grad_scaler/TEST-TestShardGradScaler-20230111221923.xml (deflated 64%) 2023-01-11T22:54:27.3571402Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_sharded_grad_scaler/TEST-TestShardedGradScalerParityWithDDP-20230111221923.xml (deflated 83%) 2023-01-11T22:54:27.3572279Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_freezing_weights/TEST-TestFreezingWeights-20230111221958.xml (deflated 85%) 2023-01-11T22:54:27.3573939Z adding: test/test-reports/python-unittest/distributed._composable.test_fully_shard/TEST-TestFSDPInitialization-20230111222036.xml (deflated 72%) 2023-01-11T22:54:27.3574800Z adding: test/test-reports/python-unittest/distributed._composable.test_fully_shard/TEST-TestFSDPModelCheckpointing-20230111222036.xml (deflated 68%) 2023-01-11T22:54:27.3575596Z adding: test/test-reports/python-unittest/distributed._composable.test_fully_shard/TEST-TestFSDPRuntime-20230111222036.xml (deflated 51%) 2023-01-11T22:54:27.3576369Z adding: test/test-reports/python-unittest/distributed._composable.test_fully_shard/TEST-TestMixedPrecision-20230111222036.xml (deflated 39%) 2023-01-11T22:54:27.3577168Z adding: test/test-reports/python-unittest/distributed._composable.test_fully_shard/TEST-TestFSDPOptimStateDict-20230111222036.xml (deflated 60%) 2023-01-11T22:54:27.3577905Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-FileStoreTest-20230111222125.xml (deflated 40%) 2023-01-11T22:54:27.3578578Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-FileStoreTest-20230111222129.xml (deflated 39%) 2023-01-11T22:54:27.3579268Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-FileStoreTest-20230111222132.xml (deflated 39%) 2023-01-11T22:54:27.3579964Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-FileStoreTest-20230111222136.xml (deflated 40%) 2023-01-11T22:54:27.3580643Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-HashStoreTest-20230111222140.xml (deflated 40%) 2023-01-11T22:54:27.3581309Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-HashStoreTest-20230111222144.xml (deflated 39%) 2023-01-11T22:54:27.3582022Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-PrefixFileStoreTest-20230111222148.xml (deflated 40%) 2023-01-11T22:54:27.3582750Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-PrefixFileStoreTest-20230111222152.xml (deflated 40%) 2023-01-11T22:54:27.3583467Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-PrefixStoreTest-20230111222155.xml (deflated 40%) 2023-01-11T22:54:27.3584167Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-PrefixTCPStoreTest-20230111222158.xml (deflated 40%) 2023-01-11T22:54:27.3585010Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-PrefixTCPStoreTest-20230111222201.xml (deflated 40%) 2023-01-11T22:54:27.3585738Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-PythonStoreTest-20230111222205.xml (deflated 40%) 2023-01-11T22:54:27.3586450Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-RendezvousEnvTest-20230111222209.xml (deflated 39%) 2023-01-11T22:54:27.3587145Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-RendezvousFileTest-20230111222213.xml (deflated 40%) 2023-01-11T22:54:27.3587859Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-RendezvousFileTest-20230111222217.xml (deflated 40%) 2023-01-11T22:54:27.3588573Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-RendezvousTCPTest-20230111222221.xml (deflated 40%) 2023-01-11T22:54:27.3589285Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-RendezvousTCPTest-20230111222225.xml (deflated 40%) 2023-01-11T22:54:27.3589974Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-RendezvousTCPTest-20230111222228.xml (deflated 39%) 2023-01-11T22:54:27.3590758Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-RendezvousTCPTest-20230111222232.xml (deflated 40%) 2023-01-11T22:54:27.3591482Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-RendezvousTest-20230111222246.xml (deflated 39%) 2023-01-11T22:54:27.3592170Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-RendezvousTest-20230111222250.xml (deflated 39%) 2023-01-11T22:54:27.3592837Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222254.xml (deflated 39%) 2023-01-11T22:54:27.3593515Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222258.xml (deflated 39%) 2023-01-11T22:54:27.3594191Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222302.xml (deflated 38%) 2023-01-11T22:54:27.3594880Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222305.xml (deflated 38%) 2023-01-11T22:54:27.3595540Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222309.xml (deflated 38%) 2023-01-11T22:54:27.3596213Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222313.xml (deflated 39%) 2023-01-11T22:54:27.3596886Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222317.xml (deflated 39%) 2023-01-11T22:54:27.3597555Z adding: test/test-reports/python-unittest/distributed.test_store/TEST-TCPStoreTest-20230111222323.xml (deflated 39%) 2023-01-11T22:54:27.3598243Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMisc-20230111222327.xml (deflated 77%) 2023-01-11T22:54:27.3599010Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMiscWorldSize1-20230111222327.xml (deflated 42%) 2023-01-11T22:54:27.3599805Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_checkpoint/TEST-TestFSDPCheckpoint-20230111222433.xml (deflated 90%) 2023-01-11T22:54:27.3600631Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_checkpoint/TEST-TestFSDPCheckpointSubmodule-20230111222433.xml (deflated 43%) 2023-01-11T22:54:27.3601569Z adding: test/test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerDistributed-20230111222543.xml (deflated 90%) 2023-01-11T22:54:27.3602556Z adding: test/test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerSingleRank-20230111222543.xml (deflated 73%) 2023-01-11T22:54:27.3603447Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params/TEST-TestSummonFullParams-20230111222823.xml (deflated 91%) 2023-01-11T22:54:27.3604385Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params/TEST-TestSummonFullParamsNoShard-20230111222823.xml (deflated 43%) 2023-01-11T22:54:27.3605131Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223128.xml (deflated 39%) 2023-01-11T22:54:27.3605799Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223135.xml (deflated 38%) 2023-01-11T22:54:27.3606457Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223142.xml (deflated 38%) 2023-01-11T22:54:27.3607120Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223148.xml (deflated 38%) 2023-01-11T22:54:27.3607759Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223154.xml (deflated 38%) 2023-01-11T22:54:27.3608420Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223202.xml (deflated 38%) 2023-01-11T22:54:27.3609085Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223209.xml (deflated 39%) 2023-01-11T22:54:27.3609792Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223215.xml (deflated 38%) 2023-01-11T22:54:27.3610456Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223221.xml (deflated 38%) 2023-01-11T22:54:27.3611113Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223227.xml (deflated 38%) 2023-01-11T22:54:27.3611759Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20230111223234.xml (deflated 38%) 2023-01-11T22:54:27.3612436Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223240.xml (deflated 38%) 2023-01-11T22:54:27.3613959Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223246.xml (deflated 38%) 2023-01-11T22:54:27.3614680Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223253.xml (deflated 38%) 2023-01-11T22:54:27.3615367Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223300.xml (deflated 38%) 2023-01-11T22:54:27.3616050Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223307.xml (deflated 38%) 2023-01-11T22:54:27.3616711Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223313.xml (deflated 38%) 2023-01-11T22:54:27.3617394Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223320.xml (deflated 39%) 2023-01-11T22:54:27.3618073Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223327.xml (deflated 38%) 2023-01-11T22:54:27.3618754Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223334.xml (deflated 38%) 2023-01-11T22:54:27.3619473Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223340.xml (deflated 38%) 2023-01-11T22:54:27.3620133Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CompilerTest-20230111223347.xml (deflated 38%) 2023-01-11T22:54:27.3620883Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223354.xml (deflated 45%) 2023-01-11T22:54:27.3621679Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223402.xml (deflated 45%) 2023-01-11T22:54:27.3622472Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223409.xml (deflated 43%) 2023-01-11T22:54:27.3623257Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223417.xml (deflated 43%) 2023-01-11T22:54:27.3624167Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223425.xml (deflated 45%) 2023-01-11T22:54:27.3624960Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223432.xml (deflated 45%) 2023-01-11T22:54:27.3625741Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223440.xml (deflated 47%) 2023-01-11T22:54:27.3626519Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223448.xml (deflated 47%) 2023-01-11T22:54:27.3627307Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223456.xml (deflated 44%) 2023-01-11T22:54:27.3628098Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223503.xml (deflated 46%) 2023-01-11T22:54:27.3628886Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223511.xml (deflated 46%) 2023-01-11T22:54:27.3629730Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223519.xml (deflated 44%) 2023-01-11T22:54:27.3630534Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223526.xml (deflated 44%) 2023-01-11T22:54:27.3631315Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223534.xml (deflated 43%) 2023-01-11T22:54:27.3632107Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223540.xml (deflated 44%) 2023-01-11T22:54:27.3632874Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223547.xml (deflated 45%) 2023-01-11T22:54:27.3633658Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223553.xml (deflated 44%) 2023-01-11T22:54:27.3634445Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223600.xml (deflated 45%) 2023-01-11T22:54:27.3635228Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223606.xml (deflated 45%) 2023-01-11T22:54:27.3635996Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223612.xml (deflated 50%) 2023-01-11T22:54:27.3636787Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223620.xml (deflated 42%) 2023-01-11T22:54:27.3637566Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223627.xml (deflated 41%) 2023-01-11T22:54:27.3638357Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223634.xml (deflated 41%) 2023-01-11T22:54:27.3639132Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223641.xml (deflated 41%) 2023-01-11T22:54:27.3639919Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223649.xml (deflated 41%) 2023-01-11T22:54:27.3640702Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223656.xml (deflated 42%) 2023-01-11T22:54:27.3641489Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223703.xml (deflated 42%) 2023-01-11T22:54:27.3642251Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223709.xml (deflated 41%) 2023-01-11T22:54:27.3643121Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223715.xml (deflated 41%) 2023-01-11T22:54:27.3643904Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223721.xml (deflated 44%) 2023-01-11T22:54:27.3644694Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223727.xml (deflated 45%) 2023-01-11T22:54:27.3645465Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223734.xml (deflated 41%) 2023-01-11T22:54:27.3646293Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223741.xml (deflated 40%) 2023-01-11T22:54:27.3647075Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223747.xml (deflated 41%) 2023-01-11T22:54:27.3647862Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223755.xml (deflated 41%) 2023-01-11T22:54:27.3648729Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223801.xml (deflated 42%) 2023-01-11T22:54:27.3649523Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223807.xml (deflated 41%) 2023-01-11T22:54:27.3650306Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20230111223815.xml (deflated 41%) 2023-01-11T22:54:27.3651193Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223823.xml (deflated 43%) 2023-01-11T22:54:27.3652157Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223829.xml (deflated 42%) 2023-01-11T22:54:27.3653791Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223835.xml (deflated 42%) 2023-01-11T22:54:27.3654797Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223841.xml (deflated 44%) 2023-01-11T22:54:27.3655788Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-GlooProcessGroupWithDispatchedCollectivesTests-20230111223847.xml (deflated 42%) 2023-01-11T22:54:27.3656634Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223853.xml (deflated 39%) 2023-01-11T22:54:27.3657383Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223900.xml (deflated 39%) 2023-01-11T22:54:27.3658122Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223907.xml (deflated 39%) 2023-01-11T22:54:27.3658865Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223914.xml (deflated 40%) 2023-01-11T22:54:27.3659611Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223920.xml (deflated 40%) 2023-01-11T22:54:27.3660352Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223927.xml (deflated 40%) 2023-01-11T22:54:27.3661073Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223933.xml (deflated 40%) 2023-01-11T22:54:27.3661815Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223940.xml (deflated 39%) 2023-01-11T22:54:27.3662565Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223949.xml (deflated 40%) 2023-01-11T22:54:27.3663303Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111223956.xml (deflated 40%) 2023-01-11T22:54:27.3664150Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224003.xml (deflated 40%) 2023-01-11T22:54:27.3664901Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224011.xml (deflated 39%) 2023-01-11T22:54:27.3665644Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224017.xml (deflated 40%) 2023-01-11T22:54:27.3666379Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224024.xml (deflated 40%) 2023-01-11T22:54:27.3667099Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224030.xml (deflated 40%) 2023-01-11T22:54:27.3667840Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224036.xml (deflated 40%) 2023-01-11T22:54:27.3668588Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224043.xml (deflated 40%) 2023-01-11T22:54:27.3669392Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224050.xml (deflated 40%) 2023-01-11T22:54:27.3670134Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224057.xml (deflated 40%) 2023-01-11T22:54:27.3670872Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224104.xml (deflated 40%) 2023-01-11T22:54:27.3671611Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224111.xml (deflated 40%) 2023-01-11T22:54:27.3672354Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224118.xml (deflated 40%) 2023-01-11T22:54:27.3673080Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224124.xml (deflated 40%) 2023-01-11T22:54:27.3673820Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224132.xml (deflated 40%) 2023-01-11T22:54:27.3674561Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224138.xml (deflated 40%) 2023-01-11T22:54:27.3675296Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224145.xml (deflated 40%) 2023-01-11T22:54:27.3676012Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224152.xml (deflated 40%) 2023-01-11T22:54:27.3676752Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224159.xml (deflated 39%) 2023-01-11T22:54:27.3677481Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224205.xml (deflated 40%) 2023-01-11T22:54:27.3678225Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224213.xml (deflated 39%) 2023-01-11T22:54:27.3678947Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224219.xml (deflated 40%) 2023-01-11T22:54:27.3679692Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224226.xml (deflated 40%) 2023-01-11T22:54:27.3680430Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224233.xml (deflated 39%) 2023-01-11T22:54:27.3681169Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224242.xml (deflated 40%) 2023-01-11T22:54:27.3681887Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224249.xml (deflated 40%) 2023-01-11T22:54:27.3682703Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224255.xml (deflated 40%) 2023-01-11T22:54:27.3683441Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224303.xml (deflated 40%) 2023-01-11T22:54:27.3684173Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224309.xml (deflated 40%) 2023-01-11T22:54:27.3684896Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224316.xml (deflated 40%) 2023-01-11T22:54:27.3685636Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224324.xml (deflated 40%) 2023-01-11T22:54:27.3686370Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224330.xml (deflated 40%) 2023-01-11T22:54:27.3687104Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224337.xml (deflated 40%) 2023-01-11T22:54:27.3687888Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224344.xml (deflated 39%) 2023-01-11T22:54:27.3688649Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224351.xml (deflated 39%) 2023-01-11T22:54:27.3689393Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224357.xml (deflated 40%) 2023-01-11T22:54:27.3690128Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224405.xml (deflated 41%) 2023-01-11T22:54:27.3690844Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224407.xml (deflated 40%) 2023-01-11T22:54:27.3691576Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224413.xml (deflated 40%) 2023-01-11T22:54:27.3692383Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224415.xml (deflated 40%) 2023-01-11T22:54:27.3693858Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20230111224423.xml (deflated 40%) 2023-01-11T22:54:27.3694601Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224430.xml (deflated 39%) 2023-01-11T22:54:27.3695264Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224432.xml (deflated 39%) 2023-01-11T22:54:27.3695946Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224434.xml (deflated 39%) 2023-01-11T22:54:27.3696616Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224436.xml (deflated 39%) 2023-01-11T22:54:27.3697287Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224438.xml (deflated 38%) 2023-01-11T22:54:27.3697958Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20230111224441.xml (deflated 39%) 2023-01-11T22:54:27.3698659Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-RendezvousEnvTest-20230111224443.xml (deflated 39%) 2023-01-11T22:54:27.3699362Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-TimeoutTest-20230111224447.xml (deflated 41%) 2023-01-11T22:54:27.3700049Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestHooks-20230111224451.xml (deflated 79%) 2023-01-11T22:54:27.3700737Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestNoGrad-20230111224451.xml (deflated 64%) 2023-01-11T22:54:27.3701454Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParamInit-20230111224451.xml (deflated 61%) 2023-01-11T22:54:27.3702308Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParityWithDDP-20230111224451.xml (deflated 91%) 2023-01-11T22:54:27.3703085Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212338.xml (deflated 41%) 2023-01-11T22:54:27.3703854Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212346.xml (deflated 42%) 2023-01-11T22:54:27.3704641Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212349.xml (deflated 41%) 2023-01-11T22:54:27.3705410Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212356.xml (deflated 43%) 2023-01-11T22:54:27.3706185Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212400.xml (deflated 42%) 2023-01-11T22:54:27.3706945Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212408.xml (deflated 41%) 2023-01-11T22:54:27.3707788Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212416.xml (deflated 41%) 2023-01-11T22:54:27.3708588Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212425.xml (deflated 40%) 2023-01-11T22:54:27.3709361Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212433.xml (deflated 41%) 2023-01-11T22:54:27.3710120Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212441.xml (deflated 40%) 2023-01-11T22:54:27.3710890Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212449.xml (deflated 39%) 2023-01-11T22:54:27.3711665Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212458.xml (deflated 40%) 2023-01-11T22:54:27.3712443Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212506.xml (deflated 40%) 2023-01-11T22:54:27.3713195Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212514.xml (deflated 42%) 2023-01-11T22:54:27.3713972Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212519.xml (deflated 41%) 2023-01-11T22:54:27.3714737Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212525.xml (deflated 42%) 2023-01-11T22:54:27.3715505Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212529.xml (deflated 42%) 2023-01-11T22:54:27.3716269Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212537.xml (deflated 42%) 2023-01-11T22:54:27.3717028Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212540.xml (deflated 45%) 2023-01-11T22:54:27.3717803Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212542.xml (deflated 47%) 2023-01-11T22:54:27.3718572Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212545.xml (deflated 48%) 2023-01-11T22:54:27.3719338Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212547.xml (deflated 45%) 2023-01-11T22:54:27.3720088Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212549.xml (deflated 40%) 2023-01-11T22:54:27.3720852Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212558.xml (deflated 43%) 2023-01-11T22:54:27.3721694Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212600.xml (deflated 44%) 2023-01-11T22:54:27.3722460Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212602.xml (deflated 43%) 2023-01-11T22:54:27.3723216Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212605.xml (deflated 44%) 2023-01-11T22:54:27.3723981Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212607.xml (deflated 43%) 2023-01-11T22:54:27.3724748Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212610.xml (deflated 40%) 2023-01-11T22:54:27.3725512Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212618.xml (deflated 41%) 2023-01-11T22:54:27.3726269Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212620.xml (deflated 42%) 2023-01-11T22:54:27.3727093Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212623.xml (deflated 41%) 2023-01-11T22:54:27.3727883Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212631.xml (deflated 42%) 2023-01-11T22:54:27.3728643Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212638.xml (deflated 42%) 2023-01-11T22:54:27.3729389Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212640.xml (deflated 42%) 2023-01-11T22:54:27.3730156Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212643.xml (deflated 42%) 2023-01-11T22:54:27.3730927Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212645.xml (deflated 42%) 2023-01-11T22:54:27.3731692Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212647.xml (deflated 40%) 2023-01-11T22:54:27.3732443Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212656.xml (deflated 40%) 2023-01-11T22:54:27.3734038Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212705.xml (deflated 43%) 2023-01-11T22:54:27.3734822Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212707.xml (deflated 41%) 2023-01-11T22:54:27.3735589Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212710.xml (deflated 41%) 2023-01-11T22:54:27.3736338Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212712.xml (deflated 41%) 2023-01-11T22:54:27.3737114Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212714.xml (deflated 41%) 2023-01-11T22:54:27.3737882Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212717.xml (deflated 41%) 2023-01-11T22:54:27.3738655Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212719.xml (deflated 41%) 2023-01-11T22:54:27.3739406Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212722.xml (deflated 41%) 2023-01-11T22:54:27.3740175Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212724.xml (deflated 41%) 2023-01-11T22:54:27.3740944Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212726.xml (deflated 41%) 2023-01-11T22:54:27.3741825Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212729.xml (deflated 41%) 2023-01-11T22:54:27.3742582Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212735.xml (deflated 41%) 2023-01-11T22:54:27.3743357Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212738.xml (deflated 41%) 2023-01-11T22:54:27.3744123Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212740.xml (deflated 41%) 2023-01-11T22:54:27.3744886Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212743.xml (deflated 40%) 2023-01-11T22:54:27.3745636Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212749.xml (deflated 40%) 2023-01-11T22:54:27.3746412Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212758.xml (deflated 41%) 2023-01-11T22:54:27.3747253Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212806.xml (deflated 41%) 2023-01-11T22:54:27.3748046Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212815.xml (deflated 40%) 2023-01-11T22:54:27.3748795Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212823.xml (deflated 42%) 2023-01-11T22:54:27.3749559Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212830.xml (deflated 42%) 2023-01-11T22:54:27.3750325Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212836.xml (deflated 42%) 2023-01-11T22:54:27.3751096Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212843.xml (deflated 42%) 2023-01-11T22:54:27.3751850Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212850.xml (deflated 41%) 2023-01-11T22:54:27.3752621Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212858.xml (deflated 41%) 2023-01-11T22:54:27.3753391Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212907.xml (deflated 40%) 2023-01-11T22:54:27.3754162Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212915.xml (deflated 41%) 2023-01-11T22:54:27.3754911Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212924.xml (deflated 40%) 2023-01-11T22:54:27.3755677Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212932.xml (deflated 41%) 2023-01-11T22:54:27.3756482Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212940.xml (deflated 40%) 2023-01-11T22:54:27.3757249Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212949.xml (deflated 40%) 2023-01-11T22:54:27.3758021Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111212957.xml (deflated 40%) 2023-01-11T22:54:27.3758771Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213005.xml (deflated 41%) 2023-01-11T22:54:27.3759543Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213008.xml (deflated 41%) 2023-01-11T22:54:27.3760314Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213010.xml (deflated 41%) 2023-01-11T22:54:27.3761230Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213013.xml (deflated 43%) 2023-01-11T22:54:27.3761997Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213015.xml (deflated 43%) 2023-01-11T22:54:27.3762756Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213017.xml (deflated 42%) 2023-01-11T22:54:27.3763502Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213020.xml (deflated 42%) 2023-01-11T22:54:27.3764273Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213022.xml (deflated 43%) 2023-01-11T22:54:27.3765034Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213025.xml (deflated 42%) 2023-01-11T22:54:27.3765808Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213027.xml (deflated 43%) 2023-01-11T22:54:27.3766621Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213029.xml (deflated 42%) 2023-01-11T22:54:27.3767407Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213032.xml (deflated 43%) 2023-01-11T22:54:27.3768169Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213034.xml (deflated 42%) 2023-01-11T22:54:27.3768939Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213037.xml (deflated 43%) 2023-01-11T22:54:27.3769686Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213039.xml (deflated 43%) 2023-01-11T22:54:27.3770455Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213041.xml (deflated 42%) 2023-01-11T22:54:27.3771226Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213044.xml (deflated 42%) 2023-01-11T22:54:27.3771991Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213046.xml (deflated 42%) 2023-01-11T22:54:27.3772743Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213048.xml (deflated 43%) 2023-01-11T22:54:27.3774304Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213051.xml (deflated 42%) 2023-01-11T22:54:27.3775078Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213053.xml (deflated 42%) 2023-01-11T22:54:27.3775841Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213056.xml (deflated 42%) 2023-01-11T22:54:27.3776597Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213058.xml (deflated 42%) 2023-01-11T22:54:27.3777364Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213100.xml (deflated 42%) 2023-01-11T22:54:27.3778129Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213103.xml (deflated 42%) 2023-01-11T22:54:27.3778889Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213105.xml (deflated 42%) 2023-01-11T22:54:27.3779638Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213107.xml (deflated 43%) 2023-01-11T22:54:27.3780402Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213110.xml (deflated 41%) 2023-01-11T22:54:27.3781292Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213119.xml (deflated 42%) 2023-01-11T22:54:27.3782058Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213125.xml (deflated 42%) 2023-01-11T22:54:27.3782806Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213128.xml (deflated 43%) 2023-01-11T22:54:27.3783575Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213130.xml (deflated 41%) 2023-01-11T22:54:27.3784347Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213139.xml (deflated 42%) 2023-01-11T22:54:27.3785103Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213141.xml (deflated 42%) 2023-01-11T22:54:27.3785854Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213148.xml (deflated 42%) 2023-01-11T22:54:27.3786712Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213150.xml (deflated 42%) 2023-01-11T22:54:27.3787505Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213157.xml (deflated 42%) 2023-01-11T22:54:27.3788277Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213200.xml (deflated 43%) 2023-01-11T22:54:27.3789040Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213202.xml (deflated 43%) 2023-01-11T22:54:27.3789784Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213204.xml (deflated 41%) 2023-01-11T22:54:27.3790561Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213207.xml (deflated 41%) 2023-01-11T22:54:27.3791327Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213209.xml (deflated 41%) 2023-01-11T22:54:27.3792090Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213211.xml (deflated 41%) 2023-01-11T22:54:27.3792839Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213214.xml (deflated 40%) 2023-01-11T22:54:27.3793606Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213216.xml (deflated 41%) 2023-01-11T22:54:27.3794360Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213219.xml (deflated 41%) 2023-01-11T22:54:27.3795130Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213221.xml (deflated 41%) 2023-01-11T22:54:27.3795884Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213223.xml (deflated 41%) 2023-01-11T22:54:27.3796648Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213226.xml (deflated 41%) 2023-01-11T22:54:27.3797411Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213228.xml (deflated 41%) 2023-01-11T22:54:27.3798177Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213237.xml (deflated 41%) 2023-01-11T22:54:27.3798924Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213239.xml (deflated 40%) 2023-01-11T22:54:27.3799767Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213247.xml (deflated 42%) 2023-01-11T22:54:27.3800532Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213254.xml (deflated 40%) 2023-01-11T22:54:27.3801294Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213302.xml (deflated 42%) 2023-01-11T22:54:27.3802041Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213306.xml (deflated 42%) 2023-01-11T22:54:27.3802810Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213310.xml (deflated 42%) 2023-01-11T22:54:27.3803565Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213315.xml (deflated 41%) 2023-01-11T22:54:27.3804328Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213324.xml (deflated 40%) 2023-01-11T22:54:27.3805132Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213334.xml (deflated 40%) 2023-01-11T22:54:27.3805918Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213343.xml (deflated 40%) 2023-01-11T22:54:27.3806681Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213351.xml (deflated 40%) 2023-01-11T22:54:27.3807446Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213400.xml (deflated 42%) 2023-01-11T22:54:27.3808186Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213404.xml (deflated 42%) 2023-01-11T22:54:27.3808952Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213408.xml (deflated 40%) 2023-01-11T22:54:27.3809720Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213416.xml (deflated 40%) 2023-01-11T22:54:27.3810486Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213424.xml (deflated 40%) 2023-01-11T22:54:27.3811236Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213432.xml (deflated 40%) 2023-01-11T22:54:27.3811998Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213441.xml (deflated 42%) 2023-01-11T22:54:27.3812767Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213445.xml (deflated 40%) 2023-01-11T22:54:27.3814036Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213453.xml (deflated 42%) 2023-01-11T22:54:27.3814797Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213457.xml (deflated 41%) 2023-01-11T22:54:27.3815570Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213506.xml (deflated 42%) 2023-01-11T22:54:27.3816334Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213510.xml (deflated 42%) 2023-01-11T22:54:27.3817093Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213514.xml (deflated 40%) 2023-01-11T22:54:27.3817848Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213523.xml (deflated 40%) 2023-01-11T22:54:27.3818611Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213532.xml (deflated 42%) 2023-01-11T22:54:27.3819484Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213536.xml (deflated 40%) 2023-01-11T22:54:27.3820255Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213545.xml (deflated 42%) 2023-01-11T22:54:27.3821006Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213547.xml (deflated 42%) 2023-01-11T22:54:27.3821774Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213549.xml (deflated 41%) 2023-01-11T22:54:27.3822546Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213552.xml (deflated 41%) 2023-01-11T22:54:27.3823303Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213554.xml (deflated 41%) 2023-01-11T22:54:27.3824056Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213556.xml (deflated 41%) 2023-01-11T22:54:27.3824897Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213559.xml (deflated 41%) 2023-01-11T22:54:27.3825690Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213601.xml (deflated 41%) 2023-01-11T22:54:27.3826451Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213604.xml (deflated 41%) 2023-01-11T22:54:27.3827201Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213606.xml (deflated 41%) 2023-01-11T22:54:27.3827964Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213608.xml (deflated 41%) 2023-01-11T22:54:27.3828732Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213611.xml (deflated 42%) 2023-01-11T22:54:27.3829510Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213613.xml (deflated 42%) 2023-01-11T22:54:27.3830275Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213617.xml (deflated 41%) 2023-01-11T22:54:27.3831021Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213626.xml (deflated 40%) 2023-01-11T22:54:27.3831793Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213634.xml (deflated 40%) 2023-01-11T22:54:27.3832552Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213643.xml (deflated 40%) 2023-01-11T22:54:27.3833318Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213651.xml (deflated 40%) 2023-01-11T22:54:27.3834068Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213659.xml (deflated 40%) 2023-01-11T22:54:27.3834836Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213717.xml (deflated 41%) 2023-01-11T22:54:27.3835602Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213726.xml (deflated 41%) 2023-01-11T22:54:27.3836365Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213735.xml (deflated 41%) 2023-01-11T22:54:27.3837108Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213743.xml (deflated 40%) 2023-01-11T22:54:27.3837873Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213751.xml (deflated 42%) 2023-01-11T22:54:27.3838723Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213756.xml (deflated 42%) 2023-01-11T22:54:27.3839493Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213800.xml (deflated 42%) 2023-01-11T22:54:27.3840241Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213809.xml (deflated 41%) 2023-01-11T22:54:27.3841012Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213817.xml (deflated 42%) 2023-01-11T22:54:27.3841773Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213821.xml (deflated 40%) 2023-01-11T22:54:27.3842536Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213829.xml (deflated 42%) 2023-01-11T22:54:27.3843289Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213834.xml (deflated 41%) 2023-01-11T22:54:27.3844109Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213843.xml (deflated 41%) 2023-01-11T22:54:27.3844899Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213851.xml (deflated 40%) 2023-01-11T22:54:27.3845711Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213859.xml (deflated 42%) 2023-01-11T22:54:27.3846459Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213903.xml (deflated 42%) 2023-01-11T22:54:27.3847225Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213907.xml (deflated 42%) 2023-01-11T22:54:27.3847998Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213911.xml (deflated 41%) 2023-01-11T22:54:27.3848764Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213920.xml (deflated 41%) 2023-01-11T22:54:27.3849515Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213928.xml (deflated 40%) 2023-01-11T22:54:27.3850290Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213935.xml (deflated 41%) 2023-01-11T22:54:27.3851055Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213942.xml (deflated 42%) 2023-01-11T22:54:27.3851821Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213946.xml (deflated 42%) 2023-01-11T22:54:27.3852566Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213950.xml (deflated 40%) 2023-01-11T22:54:27.3854144Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213957.xml (deflated 41%) 2023-01-11T22:54:27.3854922Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111213959.xml (deflated 40%) 2023-01-11T22:54:27.3855697Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214002.xml (deflated 42%) 2023-01-11T22:54:27.3856446Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214004.xml (deflated 41%) 2023-01-11T22:54:27.3857234Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214006.xml (deflated 41%) 2023-01-11T22:54:27.3858002Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214009.xml (deflated 41%) 2023-01-11T22:54:27.3858880Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214011.xml (deflated 41%) 2023-01-11T22:54:27.3859631Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214013.xml (deflated 41%) 2023-01-11T22:54:27.3860409Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214020.xml (deflated 43%) 2023-01-11T22:54:27.3861174Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214023.xml (deflated 41%) 2023-01-11T22:54:27.3861943Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214030.xml (deflated 40%) 2023-01-11T22:54:27.3862694Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214037.xml (deflated 40%) 2023-01-11T22:54:27.3863466Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214044.xml (deflated 40%) 2023-01-11T22:54:27.3864328Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214052.xml (deflated 41%) 2023-01-11T22:54:27.3865126Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214100.xml (deflated 41%) 2023-01-11T22:54:27.3865895Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214108.xml (deflated 41%) 2023-01-11T22:54:27.3866757Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214117.xml (deflated 41%) 2023-01-11T22:54:27.3867520Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214125.xml (deflated 41%) 2023-01-11T22:54:27.3868275Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214149.xml (deflated 41%) 2023-01-11T22:54:27.3869047Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214213.xml (deflated 42%) 2023-01-11T22:54:27.3869815Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214215.xml (deflated 42%) 2023-01-11T22:54:27.3870584Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214218.xml (deflated 41%) 2023-01-11T22:54:27.3871325Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214220.xml (deflated 41%) 2023-01-11T22:54:27.3872086Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214222.xml (deflated 41%) 2023-01-11T22:54:27.3872851Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214225.xml (deflated 42%) 2023-01-11T22:54:27.3873624Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214227.xml (deflated 42%) 2023-01-11T22:54:27.3874377Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214230.xml (deflated 42%) 2023-01-11T22:54:27.3875142Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214232.xml (deflated 42%) 2023-01-11T22:54:27.3875903Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214234.xml (deflated 42%) 2023-01-11T22:54:27.3876672Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214237.xml (deflated 42%) 2023-01-11T22:54:27.3877419Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214239.xml (deflated 42%) 2023-01-11T22:54:27.3878266Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214241.xml (deflated 42%) 2023-01-11T22:54:27.3879029Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214244.xml (deflated 40%) 2023-01-11T22:54:27.3879798Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214251.xml (deflated 40%) 2023-01-11T22:54:27.3880555Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214258.xml (deflated 41%) 2023-01-11T22:54:27.3881319Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214300.xml (deflated 42%) 2023-01-11T22:54:27.3882083Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214302.xml (deflated 42%) 2023-01-11T22:54:27.3882854Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214306.xml (deflated 40%) 2023-01-11T22:54:27.3883659Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214315.xml (deflated 41%) 2023-01-11T22:54:27.3884454Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214323.xml (deflated 40%) 2023-01-11T22:54:27.3885220Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214332.xml (deflated 42%) 2023-01-11T22:54:27.3885984Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214336.xml (deflated 41%) 2023-01-11T22:54:27.3886728Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214340.xml (deflated 41%) 2023-01-11T22:54:27.3887499Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214347.xml (deflated 40%) 2023-01-11T22:54:27.3888270Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214354.xml (deflated 42%) 2023-01-11T22:54:27.3889039Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214358.xml (deflated 40%) 2023-01-11T22:54:27.3889789Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214406.xml (deflated 40%) 2023-01-11T22:54:27.3890562Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214414.xml (deflated 40%) 2023-01-11T22:54:27.3891327Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214423.xml (deflated 40%) 2023-01-11T22:54:27.3892095Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214431.xml (deflated 42%) 2023-01-11T22:54:27.3892848Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214438.xml (deflated 43%) 2023-01-11T22:54:27.3894395Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214445.xml (deflated 42%) 2023-01-11T22:54:27.3895170Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214452.xml (deflated 42%) 2023-01-11T22:54:27.3895935Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214458.xml (deflated 41%) 2023-01-11T22:54:27.3896688Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214507.xml (deflated 41%) 2023-01-11T22:54:27.3897573Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214515.xml (deflated 42%) 2023-01-11T22:54:27.3898342Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214518.xml (deflated 41%) 2023-01-11T22:54:27.3899109Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214526.xml (deflated 43%) 2023-01-11T22:54:27.3899861Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214529.xml (deflated 43%) 2023-01-11T22:54:27.3900627Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214531.xml (deflated 40%) 2023-01-11T22:54:27.3901394Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214540.xml (deflated 42%) 2023-01-11T22:54:27.3902158Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214542.xml (deflated 42%) 2023-01-11T22:54:27.3902992Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214544.xml (deflated 40%) 2023-01-11T22:54:27.3903769Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214553.xml (deflated 41%) 2023-01-11T22:54:27.3904543Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214555.xml (deflated 41%) 2023-01-11T22:54:27.3905307Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214558.xml (deflated 41%) 2023-01-11T22:54:27.3906067Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214600.xml (deflated 41%) 2023-01-11T22:54:27.3906814Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214602.xml (deflated 41%) 2023-01-11T22:54:27.3907584Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214605.xml (deflated 41%) 2023-01-11T22:54:27.3908353Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214607.xml (deflated 41%) 2023-01-11T22:54:27.3909120Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214609.xml (deflated 41%) 2023-01-11T22:54:27.3909867Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214612.xml (deflated 41%) 2023-01-11T22:54:27.3910631Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214620.xml (deflated 42%) 2023-01-11T22:54:27.3911399Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214622.xml (deflated 42%) 2023-01-11T22:54:27.3912164Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214625.xml (deflated 42%) 2023-01-11T22:54:27.3912916Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214627.xml (deflated 41%) 2023-01-11T22:54:27.3913687Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214635.xml (deflated 41%) 2023-01-11T22:54:27.3914454Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214638.xml (deflated 40%) 2023-01-11T22:54:27.3915224Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214640.xml (deflated 41%) 2023-01-11T22:54:27.3915972Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214643.xml (deflated 40%) 2023-01-11T22:54:27.3916811Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214651.xml (deflated 41%) 2023-01-11T22:54:27.3917582Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214659.xml (deflated 40%) 2023-01-11T22:54:27.3918344Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214707.xml (deflated 40%) 2023-01-11T22:54:27.3919099Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214716.xml (deflated 42%) 2023-01-11T22:54:27.3919859Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214718.xml (deflated 42%) 2023-01-11T22:54:27.3920623Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214721.xml (deflated 41%) 2023-01-11T22:54:27.3921390Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214729.xml (deflated 40%) 2023-01-11T22:54:27.3922211Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214737.xml (deflated 41%) 2023-01-11T22:54:27.3923006Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214740.xml (deflated 41%) 2023-01-11T22:54:27.3923768Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214748.xml (deflated 40%) 2023-01-11T22:54:27.3924536Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214801.xml (deflated 41%) 2023-01-11T22:54:27.3925291Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214816.xml (deflated 42%) 2023-01-11T22:54:27.3926062Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214825.xml (deflated 42%) 2023-01-11T22:54:27.3926839Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214827.xml (deflated 42%) 2023-01-11T22:54:27.3927605Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214834.xml (deflated 42%) 2023-01-11T22:54:27.3928359Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214838.xml (deflated 41%) 2023-01-11T22:54:27.3929126Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214846.xml (deflated 41%) 2023-01-11T22:54:27.3929890Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214854.xml (deflated 41%) 2023-01-11T22:54:27.3930660Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214903.xml (deflated 40%) 2023-01-11T22:54:27.3931409Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214911.xml (deflated 40%) 2023-01-11T22:54:27.3932183Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214919.xml (deflated 39%) 2023-01-11T22:54:27.3933571Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214928.xml (deflated 39%) 2023-01-11T22:54:27.3934439Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214936.xml (deflated 40%) 2023-01-11T22:54:27.3935186Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214945.xml (deflated 40%) 2023-01-11T22:54:27.3935959Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214953.xml (deflated 42%) 2023-01-11T22:54:27.3936836Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111214957.xml (deflated 41%) 2023-01-11T22:54:27.3937616Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215004.xml (deflated 42%) 2023-01-11T22:54:27.3938374Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215008.xml (deflated 42%) 2023-01-11T22:54:27.3939138Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215016.xml (deflated 42%) 2023-01-11T22:54:27.3939905Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215018.xml (deflated 45%) 2023-01-11T22:54:27.3940675Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215021.xml (deflated 47%) 2023-01-11T22:54:27.3941432Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215023.xml (deflated 48%) 2023-01-11T22:54:27.3942265Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215026.xml (deflated 45%) 2023-01-11T22:54:27.3943054Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215028.xml (deflated 40%) 2023-01-11T22:54:27.3943819Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215036.xml (deflated 43%) 2023-01-11T22:54:27.3944585Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215039.xml (deflated 44%) 2023-01-11T22:54:27.3945334Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215041.xml (deflated 43%) 2023-01-11T22:54:27.3946099Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215043.xml (deflated 44%) 2023-01-11T22:54:27.3946870Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215046.xml (deflated 43%) 2023-01-11T22:54:27.3947635Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215048.xml (deflated 40%) 2023-01-11T22:54:27.3948648Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215057.xml (deflated 41%) 2023-01-11T22:54:27.3949418Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215059.xml (deflated 41%) 2023-01-11T22:54:27.3950186Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215101.xml (deflated 41%) 2023-01-11T22:54:27.3950950Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215110.xml (deflated 42%) 2023-01-11T22:54:27.3951710Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215117.xml (deflated 42%) 2023-01-11T22:54:27.3952486Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215119.xml (deflated 42%) 2023-01-11T22:54:27.3953254Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215121.xml (deflated 42%) 2023-01-11T22:54:27.3954018Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215124.xml (deflated 42%) 2023-01-11T22:54:27.3954767Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215126.xml (deflated 40%) 2023-01-11T22:54:27.3955528Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215135.xml (deflated 40%) 2023-01-11T22:54:27.3956384Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215144.xml (deflated 43%) 2023-01-11T22:54:27.3957188Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215146.xml (deflated 41%) 2023-01-11T22:54:27.3957945Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215148.xml (deflated 41%) 2023-01-11T22:54:27.3958710Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215151.xml (deflated 41%) 2023-01-11T22:54:27.3959476Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215153.xml (deflated 41%) 2023-01-11T22:54:27.3960246Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215156.xml (deflated 41%) 2023-01-11T22:54:27.3961003Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215158.xml (deflated 41%) 2023-01-11T22:54:27.3961827Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215200.xml (deflated 41%) 2023-01-11T22:54:27.3962616Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215203.xml (deflated 41%) 2023-01-11T22:54:27.3963381Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215205.xml (deflated 41%) 2023-01-11T22:54:27.3964129Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215208.xml (deflated 41%) 2023-01-11T22:54:27.3964893Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215214.xml (deflated 41%) 2023-01-11T22:54:27.3965659Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215217.xml (deflated 41%) 2023-01-11T22:54:27.3966426Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215219.xml (deflated 41%) 2023-01-11T22:54:27.3967174Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215222.xml (deflated 41%) 2023-01-11T22:54:27.3967935Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215228.xml (deflated 40%) 2023-01-11T22:54:27.3968694Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215237.xml (deflated 40%) 2023-01-11T22:54:27.3969457Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215245.xml (deflated 41%) 2023-01-11T22:54:27.3970206Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215254.xml (deflated 40%) 2023-01-11T22:54:27.3970978Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215302.xml (deflated 42%) 2023-01-11T22:54:27.3971746Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215309.xml (deflated 42%) 2023-01-11T22:54:27.3972509Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215316.xml (deflated 42%) 2023-01-11T22:54:27.3973853Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215323.xml (deflated 42%) 2023-01-11T22:54:27.3974637Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215329.xml (deflated 40%) 2023-01-11T22:54:27.3975405Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215338.xml (deflated 41%) 2023-01-11T22:54:27.3976280Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215346.xml (deflated 40%) 2023-01-11T22:54:27.3977034Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215354.xml (deflated 41%) 2023-01-11T22:54:27.3977804Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215403.xml (deflated 40%) 2023-01-11T22:54:27.3978572Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215411.xml (deflated 41%) 2023-01-11T22:54:27.3979336Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215419.xml (deflated 41%) 2023-01-11T22:54:27.3980087Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215428.xml (deflated 40%) 2023-01-11T22:54:27.3980866Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215436.xml (deflated 41%) 2023-01-11T22:54:27.3981704Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215444.xml (deflated 41%) 2023-01-11T22:54:27.3982495Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215447.xml (deflated 41%) 2023-01-11T22:54:27.3983247Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215449.xml (deflated 40%) 2023-01-11T22:54:27.3984011Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215451.xml (deflated 42%) 2023-01-11T22:54:27.3984778Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215454.xml (deflated 42%) 2023-01-11T22:54:27.3985552Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215456.xml (deflated 42%) 2023-01-11T22:54:27.3986306Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215459.xml (deflated 41%) 2023-01-11T22:54:27.3987073Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215501.xml (deflated 43%) 2023-01-11T22:54:27.3987835Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215503.xml (deflated 42%) 2023-01-11T22:54:27.3988598Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215506.xml (deflated 42%) 2023-01-11T22:54:27.3989349Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215508.xml (deflated 42%) 2023-01-11T22:54:27.3990120Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215510.xml (deflated 43%) 2023-01-11T22:54:27.3990886Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215513.xml (deflated 42%) 2023-01-11T22:54:27.3991651Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215515.xml (deflated 43%) 2023-01-11T22:54:27.3992415Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215518.xml (deflated 42%) 2023-01-11T22:54:27.3993168Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215520.xml (deflated 42%) 2023-01-11T22:54:27.3993932Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215522.xml (deflated 42%) 2023-01-11T22:54:27.3994693Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215525.xml (deflated 42%) 2023-01-11T22:54:27.3995543Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215527.xml (deflated 42%) 2023-01-11T22:54:27.3996297Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215529.xml (deflated 42%) 2023-01-11T22:54:27.3997063Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215532.xml (deflated 42%) 2023-01-11T22:54:27.3997827Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215534.xml (deflated 43%) 2023-01-11T22:54:27.3998584Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215537.xml (deflated 42%) 2023-01-11T22:54:27.3999331Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215539.xml (deflated 42%) 2023-01-11T22:54:27.4000106Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215541.xml (deflated 42%) 2023-01-11T22:54:27.4000920Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215544.xml (deflated 42%) 2023-01-11T22:54:27.4001703Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215546.xml (deflated 42%) 2023-01-11T22:54:27.4002451Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215549.xml (deflated 41%) 2023-01-11T22:54:27.4003213Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215557.xml (deflated 42%) 2023-01-11T22:54:27.4003976Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215604.xml (deflated 42%) 2023-01-11T22:54:27.4004748Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215606.xml (deflated 42%) 2023-01-11T22:54:27.4005500Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215609.xml (deflated 40%) 2023-01-11T22:54:27.4006264Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215618.xml (deflated 42%) 2023-01-11T22:54:27.4007030Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215620.xml (deflated 42%) 2023-01-11T22:54:27.4007797Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215627.xml (deflated 42%) 2023-01-11T22:54:27.4008544Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215629.xml (deflated 42%) 2023-01-11T22:54:27.4009317Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215636.xml (deflated 42%) 2023-01-11T22:54:27.4010086Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215638.xml (deflated 42%) 2023-01-11T22:54:27.4010852Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215641.xml (deflated 42%) 2023-01-11T22:54:27.4011601Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215643.xml (deflated 40%) 2023-01-11T22:54:27.4012365Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215645.xml (deflated 41%) 2023-01-11T22:54:27.4013902Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215648.xml (deflated 41%) 2023-01-11T22:54:27.4014792Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215650.xml (deflated 41%) 2023-01-11T22:54:27.4015548Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215652.xml (deflated 40%) 2023-01-11T22:54:27.4016314Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215655.xml (deflated 41%) 2023-01-11T22:54:27.4017078Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215657.xml (deflated 41%) 2023-01-11T22:54:27.4017843Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215700.xml (deflated 40%) 2023-01-11T22:54:27.4018593Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215702.xml (deflated 41%) 2023-01-11T22:54:27.4019357Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215704.xml (deflated 41%) 2023-01-11T22:54:27.4020194Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215707.xml (deflated 41%) 2023-01-11T22:54:27.4020981Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215715.xml (deflated 41%) 2023-01-11T22:54:27.4021726Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215718.xml (deflated 40%) 2023-01-11T22:54:27.4022493Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215726.xml (deflated 42%) 2023-01-11T22:54:27.4023258Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215733.xml (deflated 40%) 2023-01-11T22:54:27.4024019Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215741.xml (deflated 42%) 2023-01-11T22:54:27.4024773Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215745.xml (deflated 42%) 2023-01-11T22:54:27.4025543Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215749.xml (deflated 42%) 2023-01-11T22:54:27.4026311Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215753.xml (deflated 41%) 2023-01-11T22:54:27.4027074Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215803.xml (deflated 40%) 2023-01-11T22:54:27.4027820Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215812.xml (deflated 40%) 2023-01-11T22:54:27.4028585Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215822.xml (deflated 41%) 2023-01-11T22:54:27.4029354Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215830.xml (deflated 40%) 2023-01-11T22:54:27.4030120Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215839.xml (deflated 42%) 2023-01-11T22:54:27.4030955Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215843.xml (deflated 42%) 2023-01-11T22:54:27.4031715Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215847.xml (deflated 40%) 2023-01-11T22:54:27.4032479Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215855.xml (deflated 40%) 2023-01-11T22:54:27.4033225Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215903.xml (deflated 40%) 2023-01-11T22:54:27.4034070Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215911.xml (deflated 40%) 2023-01-11T22:54:27.4034844Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215920.xml (deflated 42%) 2023-01-11T22:54:27.4035611Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215924.xml (deflated 40%) 2023-01-11T22:54:27.4036357Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215932.xml (deflated 42%) 2023-01-11T22:54:27.4037123Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215936.xml (deflated 41%) 2023-01-11T22:54:27.4037887Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215945.xml (deflated 42%) 2023-01-11T22:54:27.4038655Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215949.xml (deflated 42%) 2023-01-11T22:54:27.4039456Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111215953.xml (deflated 40%) 2023-01-11T22:54:27.4040240Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220002.xml (deflated 40%) 2023-01-11T22:54:27.4041001Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220011.xml (deflated 42%) 2023-01-11T22:54:27.4041763Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220015.xml (deflated 41%) 2023-01-11T22:54:27.4042512Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220024.xml (deflated 42%) 2023-01-11T22:54:27.4043271Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220026.xml (deflated 42%) 2023-01-11T22:54:27.4044045Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220028.xml (deflated 42%) 2023-01-11T22:54:27.4044810Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220031.xml (deflated 41%) 2023-01-11T22:54:27.4045558Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220033.xml (deflated 41%) 2023-01-11T22:54:27.4046323Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220036.xml (deflated 41%) 2023-01-11T22:54:27.4047088Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220038.xml (deflated 41%) 2023-01-11T22:54:27.4047853Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220040.xml (deflated 41%) 2023-01-11T22:54:27.4048606Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220043.xml (deflated 41%) 2023-01-11T22:54:27.4049371Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220045.xml (deflated 42%) 2023-01-11T22:54:27.4050139Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220048.xml (deflated 42%) 2023-01-11T22:54:27.4050900Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220050.xml (deflated 42%) 2023-01-11T22:54:27.4051649Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220052.xml (deflated 42%) 2023-01-11T22:54:27.4052415Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220056.xml (deflated 41%) 2023-01-11T22:54:27.4054050Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220105.xml (deflated 41%) 2023-01-11T22:54:27.4054839Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220114.xml (deflated 41%) 2023-01-11T22:54:27.4055596Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220122.xml (deflated 40%) 2023-01-11T22:54:27.4056359Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220130.xml (deflated 40%) 2023-01-11T22:54:27.4057132Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220138.xml (deflated 40%) 2023-01-11T22:54:27.4057928Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220156.xml (deflated 42%) 2023-01-11T22:54:27.4058686Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220205.xml (deflated 42%) 2023-01-11T22:54:27.4059555Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220214.xml (deflated 41%) 2023-01-11T22:54:27.4060354Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220222.xml (deflated 41%) 2023-01-11T22:54:27.4061123Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220231.xml (deflated 42%) 2023-01-11T22:54:27.4061869Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220235.xml (deflated 43%) 2023-01-11T22:54:27.4062640Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220239.xml (deflated 42%) 2023-01-11T22:54:27.4063412Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220248.xml (deflated 41%) 2023-01-11T22:54:27.4064181Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220256.xml (deflated 42%) 2023-01-11T22:54:27.4064934Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220300.xml (deflated 41%) 2023-01-11T22:54:27.4065703Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220309.xml (deflated 42%) 2023-01-11T22:54:27.4066467Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220313.xml (deflated 41%) 2023-01-11T22:54:27.4067234Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220322.xml (deflated 41%) 2023-01-11T22:54:27.4067986Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220330.xml (deflated 41%) 2023-01-11T22:54:27.4068765Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220338.xml (deflated 42%) 2023-01-11T22:54:27.4069619Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220342.xml (deflated 42%) 2023-01-11T22:54:27.4070381Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220346.xml (deflated 42%) 2023-01-11T22:54:27.4071129Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220351.xml (deflated 41%) 2023-01-11T22:54:27.4071902Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220359.xml (deflated 41%) 2023-01-11T22:54:27.4072669Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220407.xml (deflated 41%) 2023-01-11T22:54:27.4073533Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220414.xml (deflated 41%) 2023-01-11T22:54:27.4074289Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220421.xml (deflated 42%) 2023-01-11T22:54:27.4075063Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220425.xml (deflated 42%) 2023-01-11T22:54:27.4075828Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220429.xml (deflated 40%) 2023-01-11T22:54:27.4076588Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220436.xml (deflated 41%) 2023-01-11T22:54:27.4077354Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220438.xml (deflated 41%) 2023-01-11T22:54:27.4078111Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220441.xml (deflated 42%) 2023-01-11T22:54:27.4078938Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220443.xml (deflated 41%) 2023-01-11T22:54:27.4079722Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220446.xml (deflated 41%) 2023-01-11T22:54:27.4080484Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220448.xml (deflated 41%) 2023-01-11T22:54:27.4081230Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220450.xml (deflated 40%) 2023-01-11T22:54:27.4081998Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220453.xml (deflated 41%) 2023-01-11T22:54:27.4082771Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220500.xml (deflated 43%) 2023-01-11T22:54:27.4083539Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220502.xml (deflated 41%) 2023-01-11T22:54:27.4084289Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220509.xml (deflated 40%) 2023-01-11T22:54:27.4085058Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220516.xml (deflated 40%) 2023-01-11T22:54:27.4085823Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220523.xml (deflated 40%) 2023-01-11T22:54:27.4086589Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220531.xml (deflated 41%) 2023-01-11T22:54:27.4087338Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220539.xml (deflated 41%) 2023-01-11T22:54:27.4088105Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220547.xml (deflated 40%) 2023-01-11T22:54:27.4088869Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220556.xml (deflated 40%) 2023-01-11T22:54:27.4089632Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220604.xml (deflated 41%) 2023-01-11T22:54:27.4090386Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220628.xml (deflated 41%) 2023-01-11T22:54:27.4091149Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220652.xml (deflated 42%) 2023-01-11T22:54:27.4091916Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220654.xml (deflated 42%) 2023-01-11T22:54:27.4092760Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220657.xml (deflated 41%) 2023-01-11T22:54:27.4094444Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220659.xml (deflated 41%) 2023-01-11T22:54:27.4095216Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220702.xml (deflated 41%) 2023-01-11T22:54:27.4095976Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220704.xml (deflated 42%) 2023-01-11T22:54:27.4096735Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220706.xml (deflated 42%) 2023-01-11T22:54:27.4097488Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220709.xml (deflated 42%) 2023-01-11T22:54:27.4098259Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220711.xml (deflated 42%) 2023-01-11T22:54:27.4099107Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220714.xml (deflated 42%) 2023-01-11T22:54:27.4099895Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220716.xml (deflated 42%) 2023-01-11T22:54:27.4100648Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220718.xml (deflated 42%) 2023-01-11T22:54:27.4101414Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220721.xml (deflated 42%) 2023-01-11T22:54:27.4102178Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220723.xml (deflated 40%) 2023-01-11T22:54:27.4102950Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220730.xml (deflated 40%) 2023-01-11T22:54:27.4103698Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220737.xml (deflated 41%) 2023-01-11T22:54:27.4104464Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220739.xml (deflated 42%) 2023-01-11T22:54:27.4105231Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220741.xml (deflated 42%) 2023-01-11T22:54:27.4105998Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220746.xml (deflated 40%) 2023-01-11T22:54:27.4106748Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220754.xml (deflated 40%) 2023-01-11T22:54:27.4107520Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220803.xml (deflated 41%) 2023-01-11T22:54:27.4108291Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220811.xml (deflated 42%) 2023-01-11T22:54:27.4109058Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220815.xml (deflated 41%) 2023-01-11T22:54:27.4109804Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220819.xml (deflated 40%) 2023-01-11T22:54:27.4110570Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220826.xml (deflated 40%) 2023-01-11T22:54:27.4111334Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220833.xml (deflated 42%) 2023-01-11T22:54:27.4112098Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220837.xml (deflated 40%) 2023-01-11T22:54:27.4112950Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220845.xml (deflated 41%) 2023-01-11T22:54:27.4113718Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220854.xml (deflated 40%) 2023-01-11T22:54:27.4114480Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220902.xml (deflated 41%) 2023-01-11T22:54:27.4115245Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220911.xml (deflated 42%) 2023-01-11T22:54:27.4115999Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220917.xml (deflated 43%) 2023-01-11T22:54:27.4116768Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220924.xml (deflated 42%) 2023-01-11T22:54:27.4117539Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220931.xml (deflated 42%) 2023-01-11T22:54:27.4118353Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220938.xml (deflated 41%) 2023-01-11T22:54:27.4119120Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220946.xml (deflated 41%) 2023-01-11T22:54:27.4119886Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220954.xml (deflated 42%) 2023-01-11T22:54:27.4120650Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111220957.xml (deflated 41%) 2023-01-11T22:54:27.4121411Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221005.xml (deflated 43%) 2023-01-11T22:54:27.4122163Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221007.xml (deflated 44%) 2023-01-11T22:54:27.4122933Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221010.xml (deflated 41%) 2023-01-11T22:54:27.4123698Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221018.xml (deflated 42%) 2023-01-11T22:54:27.4124461Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221021.xml (deflated 42%) 2023-01-11T22:54:27.4125222Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221023.xml (deflated 41%) 2023-01-11T22:54:27.4125972Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221031.xml (deflated 42%) 2023-01-11T22:54:27.4126742Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221034.xml (deflated 41%) 2023-01-11T22:54:27.4127508Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221036.xml (deflated 41%) 2023-01-11T22:54:27.4128272Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221039.xml (deflated 42%) 2023-01-11T22:54:27.4129023Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221041.xml (deflated 42%) 2023-01-11T22:54:27.4129790Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221043.xml (deflated 41%) 2023-01-11T22:54:27.4130551Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221046.xml (deflated 41%) 2023-01-11T22:54:27.4131394Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221048.xml (deflated 41%) 2023-01-11T22:54:27.4132146Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221051.xml (deflated 41%) 2023-01-11T22:54:27.4133546Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221059.xml (deflated 42%) 2023-01-11T22:54:27.4134411Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221101.xml (deflated 42%) 2023-01-11T22:54:27.4135178Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221103.xml (deflated 42%) 2023-01-11T22:54:27.4135933Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221106.xml (deflated 41%) 2023-01-11T22:54:27.4136698Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221114.xml (deflated 42%) 2023-01-11T22:54:27.4137551Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221116.xml (deflated 41%) 2023-01-11T22:54:27.4138344Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221119.xml (deflated 41%) 2023-01-11T22:54:27.4139097Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221121.xml (deflated 41%) 2023-01-11T22:54:27.4139859Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221130.xml (deflated 41%) 2023-01-11T22:54:27.4140622Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221138.xml (deflated 41%) 2023-01-11T22:54:27.4141387Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221146.xml (deflated 40%) 2023-01-11T22:54:27.4142141Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221155.xml (deflated 42%) 2023-01-11T22:54:27.4142912Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221157.xml (deflated 42%) 2023-01-11T22:54:27.4143682Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221200.xml (deflated 42%) 2023-01-11T22:54:27.4144446Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221208.xml (deflated 41%) 2023-01-11T22:54:27.4145195Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221216.xml (deflated 42%) 2023-01-11T22:54:27.4146004Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221218.xml (deflated 41%) 2023-01-11T22:54:27.4146776Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221227.xml (deflated 40%) 2023-01-11T22:54:27.4147542Z adding: test/test-reports/dist-ucc/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20230111221240.xml (deflated 41%) 2023-01-11T22:54:27.4168250Z ##[group]Run # Remove any previous test reports if they exist 2023-01-11T22:54:27.4168630Z # Remove any previous test reports if they exist 2023-01-11T22:54:27.4168953Z rm -f usage-log-*.zip 2023-01-11T22:54:27.4169334Z # this workflow is also run in bazel build test, but we dont generate usage reports for it 2023-01-11T22:54:27.4169731Z # so check to see if the file exists first 2023-01-11T22:54:27.4170023Z if [ -f 'usage_log.txt' ]; then 2023-01-11T22:54:27.4170356Z  zip "usage-log-${FILE_SUFFIX}.zip" 'usage_log.txt' 2023-01-11T22:54:27.4170771Z fi 2023-01-11T22:54:27.4183553Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:54:27.4183855Z env: 2023-01-11T22:54:27.4184102Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:54:27.4184357Z GPU_FLAG: --gpus all 2023-01-11T22:54:27.4184741Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:54:27.4185229Z FILE_SUFFIX: test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589560684 2023-01-11T22:54:27.4185583Z ##[endgroup] 2023-01-11T22:54:27.4999088Z adding: usage_log.txt (deflated 95%) 2023-01-11T22:54:27.5044816Z ##[group]Run seemethere/upload-artifact-s3@v5 2023-01-11T22:54:27.5045110Z with: 2023-01-11T22:54:27.5045389Z s3-prefix: pytorch/pytorch/3896346758/1/artifact 2023-01-11T22:54:27.5045670Z retention-days: 14 2023-01-11T22:54:27.5045939Z if-no-files-found: warn 2023-01-11T22:54:27.5046212Z path: test-jsons-*.zip 2023-01-11T22:54:27.5046449Z name: artifact 2023-01-11T22:54:27.5046700Z s3-bucket: gha-artifacts 2023-01-11T22:54:27.5046974Z region: us-east-1 2023-01-11T22:54:27.5047187Z env: 2023-01-11T22:54:27.5047427Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:54:27.5047694Z GPU_FLAG: --gpus all 2023-01-11T22:54:27.5048153Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:54:27.5048529Z ##[endgroup] 2023-01-11T22:54:27.9529186Z NOTE: s3-prefix specified, ignoring name parameter 2023-01-11T22:54:27.9530120Z With the provided path, there will be 1 file uploaded 2023-01-11T22:54:27.9530688Z Uploading to s3 prefix: pytorch/pytorch/3896346758/1/artifact 2023-01-11T22:54:27.9542194Z Starting upload of test-jsons-test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589560684.zip 2023-01-11T22:54:28.1135697Z Finished upload of test-jsons-test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589560684.zip 2023-01-11T22:54:28.1293682Z ##[group]Run seemethere/upload-artifact-s3@v5 2023-01-11T22:54:28.1293986Z with: 2023-01-11T22:54:28.1294270Z s3-prefix: pytorch/pytorch/3896346758/1/artifact 2023-01-11T22:54:28.1294567Z retention-days: 14 2023-01-11T22:54:28.1294844Z if-no-files-found: error 2023-01-11T22:54:28.1295130Z path: test-reports-*.zip 2023-01-11T22:54:28.1295370Z name: artifact 2023-01-11T22:54:28.1295637Z s3-bucket: gha-artifacts 2023-01-11T22:54:28.1295903Z region: us-east-1 2023-01-11T22:54:28.1296120Z env: 2023-01-11T22:54:28.1296361Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:54:28.1296631Z GPU_FLAG: --gpus all 2023-01-11T22:54:28.1296988Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:54:28.1297350Z ##[endgroup] 2023-01-11T22:54:28.5695183Z NOTE: s3-prefix specified, ignoring name parameter 2023-01-11T22:54:28.5695948Z With the provided path, there will be 1 file uploaded 2023-01-11T22:54:28.5696769Z Uploading to s3 prefix: pytorch/pytorch/3896346758/1/artifact 2023-01-11T22:54:28.5707791Z Starting upload of test-reports-test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589560684.zip 2023-01-11T22:54:28.7355651Z Finished upload of test-reports-test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589560684.zip 2023-01-11T22:54:28.7526686Z ##[group]Run seemethere/upload-artifact-s3@v5 2023-01-11T22:54:28.7526991Z with: 2023-01-11T22:54:28.7527289Z s3-prefix: pytorch/pytorch/3896346758/1/artifact 2023-01-11T22:54:28.7527599Z retention-days: 14 2023-01-11T22:54:28.7527862Z if-no-files-found: ignore 2023-01-11T22:54:28.7528148Z path: usage-log-*.zip 2023-01-11T22:54:28.7528411Z name: artifact 2023-01-11T22:54:28.7528656Z s3-bucket: gha-artifacts 2023-01-11T22:54:28.7528927Z region: us-east-1 2023-01-11T22:54:28.7529165Z env: 2023-01-11T22:54:28.7529395Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:54:28.7529674Z GPU_FLAG: --gpus all 2023-01-11T22:54:28.7530053Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:54:28.7530402Z ##[endgroup] 2023-01-11T22:54:29.2009837Z NOTE: s3-prefix specified, ignoring name parameter 2023-01-11T22:54:29.2010906Z With the provided path, there will be 1 file uploaded 2023-01-11T22:54:29.2011299Z Uploading to s3 prefix: pytorch/pytorch/3896346758/1/artifact 2023-01-11T22:54:29.2022637Z Starting upload of usage-log-test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589560684.zip 2023-01-11T22:54:29.3896806Z Finished upload of usage-log-test-distributed-3-3-linux.8xlarge.nvidia.gpu_10589560684.zip 2023-01-11T22:54:29.4059384Z ##[group]Run # shellcheck disable=SC2156 2023-01-11T22:54:29.4059783Z # shellcheck disable=SC2156 2023-01-11T22:54:29.4060188Z find . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2023-01-11T22:54:29.4074102Z shell: /usr/bin/bash -e {0} 2023-01-11T22:54:29.4074378Z env: 2023-01-11T22:54:29.4074611Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:54:29.4074894Z GPU_FLAG: --gpus all 2023-01-11T22:54:29.4075278Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:54:29.4075657Z ##[endgroup] 2023-01-11T22:54:29.7341215Z ##[group]Run set -x 2023-01-11T22:54:29.7341531Z set -x 2023-01-11T22:54:29.7341817Z python3 -m pip install -r requirements.txt 2023-01-11T22:54:29.7342159Z python3 -m pip install boto3==1.19.12 2023-01-11T22:54:29.7342556Z python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test 2023-01-11T22:54:29.7355598Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:54:29.7355877Z env: 2023-01-11T22:54:29.7356121Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:54:29.7356389Z GPU_FLAG: --gpus all 2023-01-11T22:54:29.7356745Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:54:29.7357125Z AWS_DEFAULT_REGION: us-east-1 2023-01-11T22:54:29.7357383Z BRANCH: 2023-01-11T22:54:29.7357607Z TEST_CONFIG: distributed 2023-01-11T22:54:29.7357860Z SHARD_NUMBER: 3 2023-01-11T22:54:29.7358176Z BUILD_ENVIRONMENT: linux-bionic-cuda11.7-py3.10-gcc7 2023-01-11T22:54:29.7358518Z PR_NUMBER: 2023-01-11T22:54:29.7358779Z PYTORCH_RETRY_TEST_CASES: 1 2023-01-11T22:54:29.7359050Z PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1 2023-01-11T22:54:29.7359372Z SHA1: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T22:54:29.7359673Z TAG: ciflow/trunk/91627 2023-01-11T22:54:29.7359916Z WORKFLOW_ID: 3896346758 2023-01-11T22:54:29.7360356Z GITHUB_TOKEN: *** 2023-01-11T22:54:29.7360628Z GHA_WORKFLOW_JOB_ID: 10589560684 2023-01-11T22:54:29.7360868Z ##[endgroup] 2023-01-11T22:54:29.7390315Z + python3 -m pip install -r requirements.txt 2023-01-11T22:54:30.0364163Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T22:54:30.0724670Z Requirement already satisfied: astunparse in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (1.6.3) 2023-01-11T22:54:30.0764368Z Requirement already satisfied: expecttest in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 3)) (0.1.4) 2023-01-11T22:54:30.0776650Z Requirement already satisfied: future in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 4)) (0.18.2) 2023-01-11T22:54:30.0789078Z Requirement already satisfied: hypothesis in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 5)) (6.62.0) 2023-01-11T22:54:30.1340121Z Requirement already satisfied: numpy in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 6)) (1.21.6) 2023-01-11T22:54:30.1351713Z Requirement already satisfied: psutil in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 7)) (5.9.1) 2023-01-11T22:54:30.1462256Z Requirement already satisfied: pyyaml in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 8)) (6.0) 2023-01-11T22:54:30.1473106Z Requirement already satisfied: requests in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 9)) (2.26.0) 2023-01-11T22:54:30.1727878Z Requirement already satisfied: setuptools in /usr/lib/python3.7/site-packages (from -r requirements.txt (line 10)) (49.1.3) 2023-01-11T22:54:30.1973881Z Requirement already satisfied: six in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 11)) (1.16.0) 2023-01-11T22:54:30.1985391Z Requirement already satisfied: types-dataclasses in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 12)) (0.6.6) 2023-01-11T22:54:30.1993561Z Requirement already satisfied: typing_extensions in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 13)) (4.4.0) 2023-01-11T22:54:30.2008542Z Requirement already satisfied: sympy in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 14)) (1.10.1) 2023-01-11T22:54:30.2034393Z Requirement already satisfied: filelock in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 15)) (3.9.0) 2023-01-11T22:54:30.2139054Z Requirement already satisfied: networkx in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 16)) (2.6.3) 2023-01-11T22:54:30.2373984Z Requirement already satisfied: jinja2 in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 17)) (3.1.2) 2023-01-11T22:54:30.2408671Z Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from astunparse->-r requirements.txt (line 2)) (0.38.4) 2023-01-11T22:54:30.2431537Z Requirement already satisfied: exceptiongroup>=1.0.0; python_version < "3.11" in /home/ec2-user/.local/lib/python3.7/site-packages (from hypothesis->-r requirements.txt (line 5)) (1.1.0) 2023-01-11T22:54:30.2460141Z Requirement already satisfied: attrs>=19.2.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from hypothesis->-r requirements.txt (line 5)) (22.2.0) 2023-01-11T22:54:30.2829208Z Requirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from hypothesis->-r requirements.txt (line 5)) (2.4.0) 2023-01-11T22:54:30.2843314Z Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (3.4) 2023-01-11T22:54:30.2859716Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (1.26.14) 2023-01-11T22:54:30.3083978Z Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (2022.12.7) 2023-01-11T22:54:30.3096499Z Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (2.0.12) 2023-01-11T22:54:30.3121596Z Requirement already satisfied: mpmath>=0.19 in /home/ec2-user/.local/lib/python3.7/site-packages (from sympy->-r requirements.txt (line 14)) (1.2.1) 2023-01-11T22:54:30.3200923Z Requirement already satisfied: MarkupSafe>=2.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from jinja2->-r requirements.txt (line 17)) (2.1.1) 2023-01-11T22:54:30.3897854Z + python3 -m pip install boto3==1.19.12 2023-01-11T22:54:30.6840141Z Defaulting to user installation because normal site-packages is not writeable 2023-01-11T22:54:30.7069058Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12) 2023-01-11T22:54:30.7138765Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12) 2023-01-11T22:54:30.7199208Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0) 2023-01-11T22:54:30.7226313Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2) 2023-01-11T22:54:30.7262445Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.14) 2023-01-11T22:54:30.7484868Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2) 2023-01-11T22:54:30.7510561Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0) 2023-01-11T22:54:30.9982853Z + python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test 2023-01-11T22:58:49.6678585Z [scribe] Scribe access token not provided, sending report via boto3... 2023-01-11T22:58:49.6679478Z ERROR ENCOUNTERED WHEN UPLOADING TO SCRIBE: {"errorMessage":"2023-01-11T22:58:33.083Z 82b3513e-687d-4821-abfc-32901da0d282 Task timed out after 60.00 seconds"} 2023-01-11T22:58:49.6680526Z 2023-01-11T22:58:49.6683211Z ----- Historic stats comparison result ------ 2023-01-11T22:58:49.6683475Z 2023-01-11T22:58:49.6683745Z job: linux-bionic-cuda11.7-py3.10-gcc7 2023-01-11T22:58:49.6684107Z commit: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e 2023-01-11T22:58:49.6684320Z 2023-01-11T22:58:49.6684534Z Commit graph (base is most recent master ancestor with at least one S3 report): 2023-01-11T22:58:49.6684783Z 2023-01-11T22:58:49.6687905Z : (master) 2023-01-11T22:58:49.6688242Z | 2023-01-11T22:58:49.6688531Z | * 8419ddda87 (HEAD) total time 2817.52s 2023-01-11T22:58:49.6688800Z | | 2023-01-11T22:58:49.6689007Z | : (2 commits) 2023-01-11T22:58:49.6689231Z |/ 2023-01-11T22:58:49.6692107Z * db2a237763 (base) 11 reports, total time 4966.48s ± 3495.23s 2023-01-11T22:58:49.6692597Z * 2b0abd4ce3 11 reports, total time 4990.75s ± 3463.36s 2023-01-11T22:58:49.6693371Z * f7939b21e1 33 reports, total time 3500.97s ± 3537.57s 2023-01-11T22:58:49.6693890Z * cb3204823e 11 reports, total time 4951.99s ± 3458.09s 2023-01-11T22:58:49.6694295Z * 6e236553f5 11 reports, total time 4964.39s ± 3513.19s 2023-01-11T22:58:49.6694721Z * cce577b391 11 reports, total time 4938.35s ± 3358.29s 2023-01-11T22:58:49.6695140Z * fae821c2f1 11 reports, total time 4751.06s ± 3169.39s 2023-01-11T22:58:49.6695561Z * 0c3659586d 11 reports, total time 4713.33s ± 3185.77s 2023-01-11T22:58:49.6695957Z * 122245985a 11 reports, total time 4767.91s ± 3184.40s 2023-01-11T22:58:49.6696369Z * b797a24259 11 reports, total time 4784.26s ± 3253.10s 2023-01-11T22:58:49.6696641Z | 2023-01-11T22:58:49.6696833Z : 2023-01-11T22:58:49.6696978Z 2023-01-11T22:58:49.6697148Z Removed (across 1399 suites) 0 tests, totaling 0.00s 2023-01-11T22:58:49.6697499Z Modified (across 0 suites) 0 tests, totaling 0.00s 2023-01-11T22:58:49.6697830Z Added (across 69 suites) 796 tests, totaling +3596.48s 2023-01-11T22:58:49.7283716Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main 2023-01-11T22:58:49.7284083Z with: 2023-01-11T22:58:49.7284281Z env: 2023-01-11T22:58:49.7284522Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:58:49.7284790Z GPU_FLAG: --gpus all 2023-01-11T22:58:49.7285145Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:58:49.7285504Z ##[endgroup] 2023-01-11T22:58:49.7303503Z ##[group]Run set -eou pipefail 2023-01-11T22:58:49.7303797Z set -eou pipefail 2023-01-11T22:58:49.7304051Z  2023-01-11T22:58:49.7304375Z echo "Holding runner for 2 hours until all ssh sessions have logged out" 2023-01-11T22:58:49.7304702Z for _ in $(seq 1440); do 2023-01-11T22:58:49.7305011Z  # Break if no ssh session exists anymore 2023-01-11T22:58:49.7305313Z  if [ "$(who)" = "" ]; then 2023-01-11T22:58:49.7305569Z  break 2023-01-11T22:58:49.7305823Z  fi 2023-01-11T22:58:49.7306250Z  echo "." 2023-01-11T22:58:49.7306475Z  sleep 5 2023-01-11T22:58:49.7306710Z done 2023-01-11T22:58:49.7319844Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:58:49.7320127Z env: 2023-01-11T22:58:49.7320368Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:58:49.7320634Z GPU_FLAG: --gpus all 2023-01-11T22:58:49.7320989Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:58:49.7321345Z ##[endgroup] 2023-01-11T22:58:49.7350708Z Holding runner for 2 hours until all ssh sessions have logged out 2023-01-11T22:58:49.7398412Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2023-01-11T22:58:49.7398822Z # ignore expansion of "docker ps -q" since it could be empty 2023-01-11T22:58:49.7399168Z # shellcheck disable=SC2046 2023-01-11T22:58:49.7399483Z docker stop $(docker ps -q) || true 2023-01-11T22:58:49.7399811Z # Prune all of the docker images 2023-01-11T22:58:49.7400096Z docker system prune -af 2023-01-11T22:58:49.7412248Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2023-01-11T22:58:49.7412548Z env: 2023-01-11T22:58:49.7412775Z GIT_DEFAULT_BRANCH: master 2023-01-11T22:58:49.7413338Z GPU_FLAG: --gpus all 2023-01-11T22:58:49.7413717Z DOCKER_CONTAINER_ID: 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:58:49.7414062Z ##[endgroup] 2023-01-11T22:58:50.4759698Z 7e0e28e30a97 2023-01-11T22:58:51.5757880Z Deleted Containers: 2023-01-11T22:58:51.5758285Z 7e0e28e30a97bf7f6b33ecd6baec4f17d41fa21050070b505f44b9fd668269e6 2023-01-11T22:58:51.5758538Z 2023-01-11T22:58:56.3130285Z Deleted Images: 2023-01-11T22:58:56.3131126Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd 2023-01-11T22:58:56.3132140Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7@sha256:0da23f4faf0ce20770149c4a783e08eaa91c07112511dc5511c77937c66edb24 2023-01-11T22:58:56.3132772Z deleted: sha256:dd055998e88c3bb7db98caef99cc4aaaa492114a459a38a5f0ab49c735f40318 2023-01-11T22:58:56.3133495Z deleted: sha256:e4008aa27d9451086197883cfac22b827879bbe380f63c8c39e3db8313773f3c 2023-01-11T22:58:56.3133945Z deleted: sha256:acc638ed73c788f1c8fdbf04e65d27fa42e6c32d67dbdb50616e173ef284a563 2023-01-11T22:58:56.3134388Z deleted: sha256:f5db2d6ac11f27c63a5f2d0250a45efbd078c37a32d2e2973e544ea526501ba9 2023-01-11T22:58:56.3134832Z deleted: sha256:ce58ef265c69d549d5071cb5418ec43a703978b2c7a88b1141673272cd29de77 2023-01-11T22:58:56.3135260Z deleted: sha256:2e1d5c5ea4a63305e9617c6ab380960f330e1755c39c714f3a3eefb2b603e92a 2023-01-11T22:58:56.3135674Z deleted: sha256:f6e5b9727392978f694412774dde4231ab32666b8604ff0d64727308d45a9163 2023-01-11T22:58:56.3136084Z deleted: sha256:0cfd1beb2f29eb03cb53178c729bf68f37f74bbd713aa6e3a6b1dd0b8121eb61 2023-01-11T22:58:56.3136752Z deleted: sha256:8ec9997c9a1e58cb7d553f0862f1f975bdc1bbd0d0297b6c483bf8731508824b 2023-01-11T22:58:56.3137200Z deleted: sha256:367991b9b8c74307719e7037512b4d1d67917a888c40c4ed86af9843f0c38c77 2023-01-11T22:58:56.3137606Z deleted: sha256:83b736f4ed1139df1136ab38e29568f048e4ce111951d4fdc1a508713303ca00 2023-01-11T22:58:56.3138027Z deleted: sha256:d2ca56a8d719ce8cfe4b65b165cdee296b1b456c56eb3e06990ca62ba42bc18a 2023-01-11T22:58:56.3138471Z deleted: sha256:cdfc9a191a57609f700ced389b19973ef4436b66b1d47313c799be166d6fce4b 2023-01-11T22:58:56.3138894Z deleted: sha256:293a82b424c32cb77ae038879dff6856d5fd08e5b9db2c5b015f424ebf88d24b 2023-01-11T22:58:56.3139311Z deleted: sha256:21a63ac2015ba16ccb51b9090e0f4b78a8437ca4dd189f376d1e9f45e6c74d3e 2023-01-11T22:58:56.3139745Z deleted: sha256:afc58208e3d75e15f79b0f474b361bbc1e21b8eb232fe613a1c79b8415827c86 2023-01-11T22:58:56.3140171Z deleted: sha256:86179776c9e36c00cf4c4a64f717303555b00dc594ab664fb29cfdb707eece2b 2023-01-11T22:58:56.3140579Z deleted: sha256:45f06a2900391711a143b22a31156d3013a0f37430ff8367ab4dbb27ac33381e 2023-01-11T22:58:56.3141100Z deleted: sha256:d7d350150f2a9a9ac81a59d44fd01f74d44f5f354f07a2f055aea97c8be52d92 2023-01-11T22:58:56.3141551Z deleted: sha256:eae59ad1f09ea2b1f99a568bfd5f85dae44a3af9b46b5ac5af19931de9e8fb8d 2023-01-11T22:58:56.3142007Z deleted: sha256:d734a02f47eda33d1022c9642ccfc469b44f5b104ab8ae7aa855ea76be550288 2023-01-11T22:58:56.3142443Z deleted: sha256:643cbebfdfe0cee71fed16a3233b2ee4b6392d91833a41269cdc80c8d0841ae9 2023-01-11T22:58:56.3142899Z deleted: sha256:3a4db9e7d414af3a157be5297f5b7bcddc4c63c0e83221d36728ddc32f84bc2d 2023-01-11T22:58:56.3143356Z deleted: sha256:d93fc8ff6f3d356d012ef7cca3d02018fa3c0a4be9ca2ae4ce78d174274ce530 2023-01-11T22:58:56.3143848Z deleted: sha256:cdd9af1450d9ea8f070e7c03dfe076d49ab615a4d3558be68bd7a4f16d804038 2023-01-11T22:58:56.3144270Z deleted: sha256:de611f0f78a22b1c6c68620370fb99a959c668d8c38a2cc832912e784f94869f 2023-01-11T22:58:56.3144693Z deleted: sha256:0c7878e2d089271e8e5c181eef49d0a43c99b827d1a60dafd15a54627d9146b4 2023-01-11T22:58:56.3145130Z deleted: sha256:55fe9bdaa629b37baee75e1f6878ab1941e000e0937ef45452a9482f29de4577 2023-01-11T22:58:56.3145565Z deleted: sha256:1e641da8b29087f06ad852506d00c1eaaedaeed0bf2a451c1d462c0b476c169f 2023-01-11T22:58:56.3146004Z deleted: sha256:77079a0a22a485736fdd6052b5270d4fb8ee1771976ed0d55e0f315cbc6d1da5 2023-01-11T22:58:56.3146426Z deleted: sha256:b3dda998ed6389c88d249f3aa8d96b2f94876931c45284ef00425cdea77b7c07 2023-01-11T22:58:56.3146855Z deleted: sha256:7dcf69834443047244edadb0a7016bc391279d485de12e8d2d8aad25af532912 2023-01-11T22:58:56.3147271Z deleted: sha256:932db4f0d0b27a0b32d81f5e108fdec1c5433e7b4614cfdc39d64189a59bc228 2023-01-11T22:58:56.3147720Z deleted: sha256:99ea7bfa823ad0725ffcc42e5dd47b90270c3a3e49a0b2a31adae4497d029331 2023-01-11T22:58:56.3148167Z deleted: sha256:58c3ab544a412f163fd3613ab991ea85fa1ae7c97f5a6cbc7b86a1b97fdd5484 2023-01-11T22:58:56.3148586Z deleted: sha256:fd75951227e5b4de5b96f5ee360cb3f1caf2f32a33ca89976013a385d23a34be 2023-01-11T22:58:56.3149028Z deleted: sha256:980d2a371b758adf16fc78370ed6b8bcf77846721ef4f20da94a6d1299457ad6 2023-01-11T22:58:56.3149441Z deleted: sha256:311c72e743e14e8490d04f4331dcc0a35309e9b94266986b0b5badd5fe499765 2023-01-11T22:58:56.3149846Z deleted: sha256:857d2698f0a0af2341318f6ed93060f21d498e989b339e2867c5236bab9c63d5 2023-01-11T22:58:56.3150241Z deleted: sha256:df69730c501d9b3ce0f2316b2b638e20515ce1e9aad01098b13234f4a2154927 2023-01-11T22:58:56.3150665Z deleted: sha256:288a6d7efd5c9e470341b16b3ea2cd41124c769de8d10643ac688d651e9767c9 2023-01-11T22:58:56.3151087Z deleted: sha256:6a565f7b04668396c32d11cad845e1cd1b84d09e6f970457c5494adfde59690f 2023-01-11T22:58:56.3151503Z deleted: sha256:b4e45ce76a79fe9f3dbbf723401c4bf189f0e70bc81fc2dbe4e70c80044c2fac 2023-01-11T22:58:56.3151948Z deleted: sha256:e4a3cd7f0f84ce7171b04c994248707950295feb437bc5feaadcb66a3f7bf5a3 2023-01-11T22:58:56.3152371Z deleted: sha256:24ac908f9f592af03d6a51011147428f9e682c79b8ca2ad5afd2b1a44aeed617 2023-01-11T22:58:56.3152865Z deleted: sha256:50cc7186fda7f64aa964824a32c139a1085ce030491c0da5fab99fdecae66fdf 2023-01-11T22:58:56.3153291Z deleted: sha256:6e9c802974cfa887b7300715840ef7aaf5765df415d8d4680f72c6034a10292b 2023-01-11T22:58:56.3153694Z deleted: sha256:30109e8b967541225d66d116256e57334c2b63c25b456f9f7cd72d14d46d8da3 2023-01-11T22:58:56.3154102Z deleted: sha256:18ce8ec73f72efbc789a00688f5d57c798690e22048389f236dbc593cec31d6e 2023-01-11T22:58:56.3154504Z deleted: sha256:195741932e0b070b4fed22eee8d97719dc71f1f569594b418d777b87dbe76a6d 2023-01-11T22:58:56.3154925Z deleted: sha256:6f099faae794c47a468400004f89aed66ec84fa1bd6c606a9877ab09c84a5289 2023-01-11T22:58:56.3155348Z deleted: sha256:5bddaa98761511a0e16047132a49704d0cf176bec84f42b91644b8e7adb3cb88 2023-01-11T22:58:56.3155741Z deleted: sha256:5089072a88c6788d2594696a16346c495f97fd117430602f033541a0f333de5f 2023-01-11T22:58:56.3156121Z deleted: sha256:9bc67bb187c368480f186819831faa7998ba6d4f2e4ab8bd5b5fbc8a5aada045 2023-01-11T22:58:56.3156533Z deleted: sha256:45bbe3d22998589317c7f6c4dd591475423bb37ca9b922529c5878653483b18d 2023-01-11T22:58:56.3156834Z 2023-01-11T22:58:56.3167931Z Total reclaimed space: 19.53GB 2023-01-11T22:58:56.3229579Z Post job cleanup. 2023-01-11T22:58:56.3267321Z Post job cleanup. 2023-01-11T22:58:56.4651589Z [command]/usr/bin/git version 2023-01-11T22:58:56.4708930Z git version 2.38.1 2023-01-11T22:58:56.4772788Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/f74a5d7c-b56a-4d61-8390-0e631427903a' before making global git config changes 2023-01-11T22:58:56.4775044Z Adding repository directory to the temporary git global config as a safe directory 2023-01-11T22:58:56.4779925Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2023-01-11T22:58:56.4821300Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2023-01-11T22:58:56.4860076Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || : 2023-01-11T22:58:56.5179334Z Entering 'android/libs/fbjni' 2023-01-11T22:58:56.5221418Z Entering 'third_party/FP16' 2023-01-11T22:58:56.5265756Z Entering 'third_party/FXdiv' 2023-01-11T22:58:56.5307453Z Entering 'third_party/NNPACK' 2023-01-11T22:58:56.5349340Z Entering 'third_party/QNNPACK' 2023-01-11T22:58:56.5394388Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T22:58:56.5437211Z Entering 'third_party/XNNPACK' 2023-01-11T22:58:56.5492643Z Entering 'third_party/benchmark' 2023-01-11T22:58:56.5533772Z Entering 'third_party/cpuinfo' 2023-01-11T22:58:56.5576082Z Entering 'third_party/cub' 2023-01-11T22:58:56.5619709Z Entering 'third_party/cudnn_frontend' 2023-01-11T22:58:56.5667850Z Entering 'third_party/cutlass' 2023-01-11T22:58:56.5717562Z Entering 'third_party/eigen' 2023-01-11T22:58:56.5763325Z Entering 'third_party/fbgemm' 2023-01-11T22:58:56.5805408Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T22:58:56.5848217Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T22:58:56.5890095Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T22:58:56.5932076Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T22:58:56.5976022Z Entering 'third_party/flatbuffers' 2023-01-11T22:58:56.6021698Z Entering 'third_party/fmt' 2023-01-11T22:58:56.6063235Z Entering 'third_party/foxi' 2023-01-11T22:58:56.6105144Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T22:58:56.6146608Z Entering 'third_party/gloo' 2023-01-11T22:58:56.6188539Z Entering 'third_party/googletest' 2023-01-11T22:58:56.6229886Z Entering 'third_party/ideep' 2023-01-11T22:58:56.6271358Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T22:58:56.6315633Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T22:58:56.6365826Z Entering 'third_party/ios-cmake' 2023-01-11T22:58:56.6407998Z Entering 'third_party/ittapi' 2023-01-11T22:58:56.6449960Z Entering 'third_party/kineto' 2023-01-11T22:58:56.6492736Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T22:58:56.6535389Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T22:58:56.6579307Z Entering 'third_party/nccl/nccl' 2023-01-11T22:58:56.6623174Z Entering 'third_party/neon2sse' 2023-01-11T22:58:56.6665419Z Entering 'third_party/nlohmann' 2023-01-11T22:58:56.6708418Z Entering 'third_party/onnx' 2023-01-11T22:58:56.6764412Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T22:58:56.6806676Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T22:58:56.6853752Z Entering 'third_party/onnx-tensorrt' 2023-01-11T22:58:56.6896733Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T22:58:56.6944119Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T22:58:56.6986633Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T22:58:56.7027936Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T22:58:56.7075407Z Entering 'third_party/pocketfft' 2023-01-11T22:58:56.7117751Z Entering 'third_party/protobuf' 2023-01-11T22:58:56.7164408Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T22:58:56.7205848Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T22:58:56.7249042Z Entering 'third_party/psimd' 2023-01-11T22:58:56.7290908Z Entering 'third_party/pthreadpool' 2023-01-11T22:58:56.7334092Z Entering 'third_party/pybind11' 2023-01-11T22:58:56.7376097Z Entering 'third_party/python-enum' 2023-01-11T22:58:56.7417900Z Entering 'third_party/python-peachpy' 2023-01-11T22:58:56.7459587Z Entering 'third_party/python-six' 2023-01-11T22:58:56.7500955Z Entering 'third_party/sleef' 2023-01-11T22:58:56.7543279Z Entering 'third_party/tbb' 2023-01-11T22:58:56.7588383Z Entering 'third_party/tensorpipe' 2023-01-11T22:58:56.7631608Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T22:58:56.7673329Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T22:58:56.7714648Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T22:58:56.7756297Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T22:58:56.7797049Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T22:58:56.7840974Z Entering 'third_party/zstd' 2023-01-11T22:58:56.7900799Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2023-01-11T22:58:56.7930888Z http.https://github.com/.extraheader 2023-01-11T22:58:56.7942313Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2023-01-11T22:58:56.7980736Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || : 2023-01-11T22:58:56.8293643Z Entering 'android/libs/fbjni' 2023-01-11T22:58:56.8317895Z http.https://github.com/.extraheader 2023-01-11T22:58:56.8350284Z Entering 'third_party/FP16' 2023-01-11T22:58:56.8375532Z http.https://github.com/.extraheader 2023-01-11T22:58:56.8407341Z Entering 'third_party/FXdiv' 2023-01-11T22:58:56.8432713Z http.https://github.com/.extraheader 2023-01-11T22:58:56.8464865Z Entering 'third_party/NNPACK' 2023-01-11T22:58:56.8488993Z http.https://github.com/.extraheader 2023-01-11T22:58:56.8522324Z Entering 'third_party/QNNPACK' 2023-01-11T22:58:56.8547485Z http.https://github.com/.extraheader 2023-01-11T22:58:56.8581666Z Entering 'third_party/VulkanMemoryAllocator' 2023-01-11T22:58:56.8606211Z http.https://github.com/.extraheader 2023-01-11T22:58:56.8638773Z Entering 'third_party/XNNPACK' 2023-01-11T22:58:56.8663603Z http.https://github.com/.extraheader 2023-01-11T22:58:56.8707221Z Entering 'third_party/benchmark' 2023-01-11T22:58:56.8732376Z http.https://github.com/.extraheader 2023-01-11T22:58:56.8764483Z Entering 'third_party/cpuinfo' 2023-01-11T22:58:56.8789796Z http.https://github.com/.extraheader 2023-01-11T22:58:56.8822868Z Entering 'third_party/cub' 2023-01-11T22:58:56.8847090Z http.https://github.com/.extraheader 2023-01-11T22:58:56.8879970Z Entering 'third_party/cudnn_frontend' 2023-01-11T22:58:56.8905217Z http.https://github.com/.extraheader 2023-01-11T22:58:56.8943595Z Entering 'third_party/cutlass' 2023-01-11T22:58:56.8968682Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9009168Z Entering 'third_party/eigen' 2023-01-11T22:58:56.9034873Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9069186Z Entering 'third_party/fbgemm' 2023-01-11T22:58:56.9093830Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9126651Z Entering 'third_party/fbgemm/third_party/asmjit' 2023-01-11T22:58:56.9151075Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9182913Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2023-01-11T22:58:56.9206503Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9239636Z Entering 'third_party/fbgemm/third_party/googletest' 2023-01-11T22:58:56.9263803Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9295655Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2023-01-11T22:58:56.9319395Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9353206Z Entering 'third_party/flatbuffers' 2023-01-11T22:58:56.9377885Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9412271Z Entering 'third_party/fmt' 2023-01-11T22:58:56.9436831Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9469014Z Entering 'third_party/foxi' 2023-01-11T22:58:56.9493593Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9526354Z Entering 'third_party/gemmlowp/gemmlowp' 2023-01-11T22:58:56.9551344Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9584443Z Entering 'third_party/gloo' 2023-01-11T22:58:56.9608615Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9640996Z Entering 'third_party/googletest' 2023-01-11T22:58:56.9665946Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9698647Z Entering 'third_party/ideep' 2023-01-11T22:58:56.9723148Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9754778Z Entering 'third_party/ideep/mkl-dnn' 2023-01-11T22:58:56.9778859Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9813201Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2023-01-11T22:58:56.9837453Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9876938Z Entering 'third_party/ios-cmake' 2023-01-11T22:58:56.9901883Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9933869Z Entering 'third_party/ittapi' 2023-01-11T22:58:56.9958599Z http.https://github.com/.extraheader 2023-01-11T22:58:56.9991049Z Entering 'third_party/kineto' 2023-01-11T22:58:57.0016239Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0047942Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2023-01-11T22:58:57.0073253Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0106627Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2023-01-11T22:58:57.0130562Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0164452Z Entering 'third_party/nccl/nccl' 2023-01-11T22:58:57.0190624Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0224871Z Entering 'third_party/neon2sse' 2023-01-11T22:58:57.0249275Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0281342Z Entering 'third_party/nlohmann' 2023-01-11T22:58:57.0306388Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0340518Z Entering 'third_party/onnx' 2023-01-11T22:58:57.0364846Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0411003Z Entering 'third_party/onnx/third_party/benchmark' 2023-01-11T22:58:57.0435776Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0467946Z Entering 'third_party/onnx/third_party/pybind11' 2023-01-11T22:58:57.0491999Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0526477Z Entering 'third_party/onnx-tensorrt' 2023-01-11T22:58:57.0551565Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0583326Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2023-01-11T22:58:57.0607823Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0645655Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2023-01-11T22:58:57.0670524Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0703924Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2023-01-11T22:58:57.0728941Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0761421Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2023-01-11T22:58:57.0787101Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0826196Z Entering 'third_party/pocketfft' 2023-01-11T22:58:57.0850591Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0882938Z Entering 'third_party/protobuf' 2023-01-11T22:58:57.0908481Z http.https://github.com/.extraheader 2023-01-11T22:58:57.0944698Z Entering 'third_party/protobuf/third_party/benchmark' 2023-01-11T22:58:57.0968403Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1000711Z Entering 'third_party/protobuf/third_party/googletest' 2023-01-11T22:58:57.1025411Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1059798Z Entering 'third_party/psimd' 2023-01-11T22:58:57.1084430Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1116846Z Entering 'third_party/pthreadpool' 2023-01-11T22:58:57.1142261Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1174514Z Entering 'third_party/pybind11' 2023-01-11T22:58:57.1198585Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1230930Z Entering 'third_party/python-enum' 2023-01-11T22:58:57.1255668Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1287360Z Entering 'third_party/python-peachpy' 2023-01-11T22:58:57.1313894Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1346000Z Entering 'third_party/python-six' 2023-01-11T22:58:57.1370539Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1402606Z Entering 'third_party/sleef' 2023-01-11T22:58:57.1427849Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1461041Z Entering 'third_party/tbb' 2023-01-11T22:58:57.1485736Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1519905Z Entering 'third_party/tensorpipe' 2023-01-11T22:58:57.1544865Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1577461Z Entering 'third_party/tensorpipe/third_party/googletest' 2023-01-11T22:58:57.1600722Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1633160Z Entering 'third_party/tensorpipe/third_party/libnop' 2023-01-11T22:58:57.1658375Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1689847Z Entering 'third_party/tensorpipe/third_party/libuv' 2023-01-11T22:58:57.1714102Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1746426Z Entering 'third_party/tensorpipe/third_party/pybind11' 2023-01-11T22:58:57.1770616Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1802526Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2023-01-11T22:58:57.1827293Z http.https://github.com/.extraheader 2023-01-11T22:58:57.1862686Z Entering 'third_party/zstd' 2023-01-11T22:58:57.1887251Z http.https://github.com/.extraheader 2023-01-11T22:58:57.2194262Z Cleaning up orphan processes